PDA

View Full Version : PowerPC 970 Redux: Dialogue and Addendum




MacRumors
Jul 21, 2003, 08:27 AM
Arstechnica posted (http://www.arstechnica.com/cpu/03q2/ppc970-interview/ppc970-interview-1.html) more details on the PowerPC 970 after having had the opportunity to chat with two IBM engineers behind the PowerPC 970.

The interview was aimed primarily at filling any lingering gaps in my previous 970 coverage, i.e. corrections, clarifications, and answering some of the questions raised by the articles. So to that end, we talked mostly about the 970's microarchitecture and specifically about the 970's VMX (a.k.a. Altivec) unit, which turns out to be a lot more flexible and robust than what I'd initially described in my articles.

While forward-looking topics were generally excluded from the interview, the article reveals more technical details regarding the PowerPC 970.



nuckinfutz
Jul 21, 2003, 08:33 AM
Good news on the Altivec clarifications. I'll read the article in it's entirety.

unreg
Jul 21, 2003, 08:57 AM
From what I have read here and elsewhere an optimized compiler can speed overall system almost as much as new hardware. This is really good news. As a side benefit, code can be produced without as much hand tweaking for good results.

jamall
Jul 21, 2003, 09:27 AM
This news will have the execs over at Intel shaking a little more (well, at least sitting up and paying attention). Performance gains from optimisation of the compiler, combined with a shift to the smaller 0.09 micron production process, and perhaps longer term, improvements to the VMX unit and integration of the memory controller, all point to a bright future for PPC computing. Perhaps an extended stay at the top of the pecking order is what Apple needs to win some of that seemingly elusive market share...

ldjessee
Jul 21, 2003, 11:25 AM
I am not sure why the author at Arstechnica is not posting the whole conversation.

Why edit it and chop it up.

Sure, it makes it easier to digest, but it just leaves too much room for the author's view on what the comments meant. Why not let us read it and let us make that evaluation?

Snowy_River
Jul 21, 2003, 11:36 AM
I've got to say, the more I read about the 970 since WWDC the more excited I am about its arrival. This chip looks to usher in a new era for Apple, the era that we hoped was here with the G3, until the G4 fiasco crushed those hopes.

"The future's so bright, I have to wear shades..."

MrMacMan
Jul 21, 2003, 11:50 AM
Omg, I read the original thing when it came out, I know some of you guys can read Swahili, but I can't.

The specifications was like how does the 2ed pipeline link to the 3ed during the fetch/decode sequence.
Me: 'What the hell am I reading?'
Them: 'the vector unit will be processed by second stage ...'
Me: WHAT IS THIS?!?!?! :rolleyes:

I mean I skipped like 1/3 of the article.
Anyone who can read thatI take my hat off to you.

Rincewind42
Jul 21, 2003, 12:54 PM
Originally posted by ldjessee
I am not sure why the author at Arstechnica is not posting the whole conversation.

Why edit it and chop it up.

Sure, it makes it easier to digest, but it just leaves too much room for the author's view on what the comments meant. Why not let us read it and let us make that evaluation?

Probably because there were comments made that they asked him not to print, because there were details that were off topic, or simply because the order of the original conversation made little or no sense. Or all 3 to some degree. The reality is he said he won't post it, so there's no reason to worry about it.

jaedreth
Jul 21, 2003, 01:25 PM
This is very encouraging news. Not only is the VMX far more flexible than previous thought, it's been made painfully clear that whatever is shipping with the G5's this year is nowhere near optimized for the machine. The compiler itself needs to be optimized for coding for this specific architecture, because this archetecture "breaks the customs" for most industry designs.

That means that as they develop and perfect an optimized version of GCC or something else, then developers can start using this new optimized complier to create optimized code, which will then show even more performance gains out of the same hardware.

So the G5's future looks brighter and brighter.

Can't wait to get mine.

Jaedreth

sparkleytone
Jul 21, 2003, 01:57 PM
this cleared up some confusion i was having with the FSB speed. the clock multiplier doesnt always have to be 2x. the engineers said there are more multipliers such as 3,4, and 6. This basically means that they don't have to limit processor speed because the FSB can't run at 1.5GHz or whatnot.

ddtlm
Jul 21, 2003, 03:16 PM
jaedreth:

it's been made painfully clear that whatever is shipping with the G5's this year is nowhere near optimized for the machine
Wowa, this is how wild rumors and crazy expectations get started. They said that GCC wasn't entirely happy with the G5, but that's a lot closer than "nowhere near" optimized.

ldjessee:

Why not let us read it and let us make that evaluation?
The author likely knew that if he just threw the facts out there that some users would get entirely the wrong idea about things.

MrMacman:

Anyone who can read thatI take my hat off to you.
Just take some processor architecture classes from some university, then you can keep your hat on.

Cubeboy
Jul 21, 2003, 03:49 PM
Originally posted by jaedreth
This is very encouraging news. Not only is the VMX far more flexible than previous thought, it's been made painfully clear that whatever is shipping with the G5's this year is nowhere near optimized for the machine. The compiler itself needs to be optimized for coding for this specific architecture, because this archetecture "breaks the customs" for most industry designs.

That means that as they develop and perfect an optimized version of GCC or something else, then developers can start using this new optimized complier to create optimized code, which will then show even more performance gains out of the same hardware.

So the G5's future looks brighter and brighter.

Can't wait to get mine.

Jaedreth

AIX 5.0 seems to work well with the G5/PPC970 as well, at least for SPEC anyways. The GCC compiler used by Apple was already optimized over standard GCC, hence the correct values for the md file, modified scheduler, and whatever other changes made before WWDC.

"As it turns out, they were in fact wrong before WWDC, but the version of gcc that Apple was using for WWDC had the correct values in it. (I'd imagine that this pre-WWDC obfuscation of the 970's vector latencies was done deliberately, but I forgot to ask him about that."

"The gcc scheduler is not really designed ideally for a processor like the 970 and the Power4 and others, and that's a lot of what the IBM and Apple teams have worked on".

And my favorite quote:
"IBM is not gonna try to compete with Apple's reality distortion field :)"

EDIT:

It should be noted that GCC 3.3 is not able to schedule for the Pentium 4 at all and has to rely on mcpu=pentium4 which really only tweaks some instruction costs for some very specific problems. Having well scheduled code is critical for nearly all floating point code.

The other thing I've found is that the Pentium 4 doesn't have enough registers for -march=pentium4, since the target machine is a 32 register RISC chip. I'm not sure exactly how much of a performance hit this would cause but a Opteron, which does have enough registers scores much closer to ICC running GCC compiled code, despite having SSE/SSE2 (And we all know ICC is much better at discovering vectorizing opportunities than GCC).

C14ru5
Jul 21, 2003, 04:49 PM
From the original article
Hannibal: So what are you guys using in the blades, then--the 970 blades? A chipset of your own design?

Peter Sandon: I haven't kept up with what we've announced about that, so I guess I don't know.
Aha! So THAT'S where the alien technology that Steve aquired came to good use :D

wizard
Jul 21, 2003, 04:52 PM
Some how I just don't think intel is worried. If we saw Apple quadrupal their market share that might be a differrent story, but right now I can't see Intel worried about a market they don't compete in anyways.

If Apple came up with a switch campaigne that actually worked that might start to worry Intel. But they need more than a switch gimick, they really need to market the positive aspects of their operating system.

I do have to agree the future looks fantastic. Now if they (Apple & IBM) can just move forward with improved 970's to hold the fort until the 980's arrive we will be all set.

Dave


Originally posted by jamall
This news will have the execs over at Intel shaking a little more (well, at least sitting up and paying attention). Performance gains from optimisation of the compiler, combined with a shift to the smaller 0.09 micron production process, and perhaps longer term, improvements to the VMX unit and integration of the memory controller, all point to a bright future for PPC computing. Perhaps an extended stay at the top of the pecking order is what Apple needs to win some of that seemingly elusive market share...

DrugsBunny
Jul 21, 2003, 07:32 PM
Intel should be worried because they sell CPUs and IBM has produced something cheaper and faster in its first incarnation running code produced by a hacked copy of gcc. Intel has reached the limits of what it can do with the P4 more or less (it can't just keep ramping up the clock-speed forever) while IBM has only just started. Itanium2 is DOA for the desktop. The days of Mac users and the PowerPC faithful waiting for anaemic increments to G4 clockspeeds on a bandwidth-starved bus from Motorola are over. Folks, welcome to flavor country.

8thDegreeSavage
Jul 21, 2003, 09:23 PM
*In best Homer voice: "I'm in flavor country"

Sun Baked
Jul 21, 2003, 09:42 PM
Notice, no real response to the PPC 970 White Box comment.

For the PPC White Box crowd the addition of PPC 970 machines would be a welcome addition to Mai Logic and Pegasos's G3/G4 motherboards (and others).

More desktop PPC machines does help Apple out, and will keep IBM focused on improving the product. It's been a long road to recovery for this class of machine since MS killed Windows for PPC.

Should be interesting to see if this keeps the Amiga on PPC alive or the bail and head over to x86. But Amiga has been working with MS lately. :rolleyes:

websterphreaky
Jul 21, 2003, 10:13 PM
Wonder why MacRumors isn't reporting that IBM is having REAL BIG problems at the Fishkill plant on the 970 production line??

You may not be getting those G5's afterall, or greatly delayed.

[mod. edit - Next time search the forums before you try to troll. That story (http://forums.macrumors.com/showthread.php?s=&threadid=32561) was posted last week when it first came out.]

GetSome681
Jul 21, 2003, 10:37 PM
Originally posted by websterphreaky
Wonder why MacRumors isn't reporting that IBM is having REAL BIG problems at the Fishkill plant on the 970 production line??

You may not be getting those G5's afterall, or greatly delayed.


Wow, too bad that was posted here like a week ago. And for you people that don't understand business speak, that article has nothing to do with production problems as in you not being able to get your G5s, it has to do with the plant losing money, since they aren't producing enough with it to turn a profit on their investment. This will change though. Keep your panties on.

MrMacMan
Jul 21, 2003, 11:09 PM
Originally posted by ddtlm

MrMacman:


Just take some processor architecture classes from some university, then you can keep your hat on.
Hey, I think that can wait a few years.

I'm not exactally in college, I don't think learning how processors work will help me...

And if the G5 comes out on schedule we will all be good.

ddtlm
Jul 21, 2003, 11:20 PM
DrugsBunny:

Optimism aside, I see no reason to believe any of what you claim. People predict the death of x86 just about as often as they predict the death of Apple... has either happened?

IBM has produced something cheaper and faster in its first incarnation
"Power4" has a "4" in its name because the generation before it had a "3" in its name.

running code produced by a hacked copy of gcc
Great, now everyone is gona blame GCC for every failing of Apple just like Moto was/is blamed. Intel gets higher benchmark score? No problem, just blame GCC!

DrugsBunny
Jul 22, 2003, 12:04 AM
Note that I never predicted the death of the x86. I predict that Intel will no longer be able to get much out of the P4 core. i.e. using a thinner process and clocking it faster only gets you so far, then you need to use a new core design.

As far as first incarnation is concerned, the 4 (as in Power4) designation is irrelevant. Recall that the G4 first appeared as a 500Mhz chip (I think). That's what I call a first incarnation. Subsequent revisions (7400 vs 7450 vs 7457 each at various clock speeds) increased the clock speed and had minor differences internally. In that sense, the PPC970 is the "first version" of that class of chip.

Again, at no stage did I blaim all of Apple's failings on gcc or mot and had Apple run the spec tests using Intel's own compiler with the P4, the SPECint tests would have shown the P4 winning by an even larger margin. The PPC970 is large beast of a processor and can handle a lot of in-flight instructions (216?) at a time. The proviso is that there are constraints on what instructions can be issued together in a given group. gcc is designed on a model of a processor where there's a pipeline and it reorders stuff on a per-instruction level to avoid stalls. It's totally ignorant of this whole group issue/dispatch business. The hack to make it work is that you fakeout what might be an equivalent pipeline and make some educated guesses on the code that should be generated for the 970. It works but is not the ideal situation.

Snowy_River
Jul 22, 2003, 12:29 AM
Originally posted by DrugsBunny
Note that I never predicted the death of the x86. I predict that Intel will no longer be able to get much out of the P4 core. i.e. using a thinner process and clocking it faster only gets you so far, then you need to use a new core design.


This is actually a very good point. I've read that Intel is running up against a wall, performance wise. That's not to say that they can't keep clocking their chips faster. Quite the contrary, they can, but they're reaching into the territory of deminishing returns.

I read an article that said that the x86 architecture at this point only gains about 2% processor performance for a 10% increase in clock speed. Now, doing some quick math, that means that to get a P4 that out-performs a 3GHz P4 by 10%, you'll need a 4.6GHz P4. Yes, that's right, a 60% increase in clock speed. This is the wall that Intel is hitting.

Now, the downside of this is that there will still be those who will say "I've got a 4.6GHz P4, and it can smoke your puny 2.5GHz G5." The MHz (or should that be GHz?) myth is still out there, and probably will be for some time. However, with entrants like the PPC970 on the field, things may start turning around a little faster than they have been.

Analog Kid
Jul 22, 2003, 01:07 AM
Originally posted by MrMacman
Hey, I think that can wait a few years.

I'm not exactally in college, I don't think learning how processors work will help me...

And if the G5 comes out on schedule we will all be good.

To be honest, you probably don't need to be in college to understand a lot of the basics. Not that you don't have better ways to spend your time, but you could probably get a good understanding of it by reading a book on architecture design if you're interested.

A lot of it is just visualization skills and learning the jargon.

Maybe by the time you figure it out they'll be hiring engineers again... :rolleyes:

Analog Kid
Jul 22, 2003, 01:23 AM
Originally posted by Snowy_River
This is actually a very good point. I've read that Intel is running up against a wall, performance wise. That's not to say that they can't keep clocking their chips faster. Quite the contrary, they can, but they're reaching into the territory of deminishing returns.


I'd be very surprised if Intel doesn't know exactly where the P4 will hit the wall and doesn't have a plan to push past it.

And I doubt they're relying on the Itanic to bail them out. Ironically though, it might be AMD that keeps the x86 ISA alive while Intel re-tools.

That said, I do like the statement that IBM is gearing to ramp up just as Intel is tapering off. It's a wonderfully optimistic feeling!

We have a roadmap, folks!
There's a future!

It's gonna be a fun couple years!



Oh, and Mot is responsible for all of Apple's problems... And then IBM arrived on a brilliant white charger, armor gleaming in the sun...

ddtlm
Jul 22, 2003, 01:27 AM
DrugsBunny:

Well then I guess you didn't predict the death/obsolesence of x86, but if that wasn't your point then I'm not sure what point you were making. Simply stating that the P4 is going to run out of steam isn't stating anything new, Intel has a new P4/P5 coming and chips after that well on the way. Also, its not impressive that the G5 as the "first" representative of its core is speed compeditive, because this has been the case for most newly launched processors that I can think of (notable exceptions being Itanium #1, Moto 745x, recent MIPS chips and recent Ultra Sparc chips). Even the original Pentium 4 was speed compeditive at launch, just not as much as its clock speed suggested.

So anyway, if all you wanted to say is that the PPC970 will crush the current P4 in the coming year then I guess I couldn't argue.

Snowy_River:

I read an article that said that the x86 architecture at this point only gains about 2% processor performance for a 10% increase in clock speed.
You may have noticed that AMD's Opteron provides more or less the same performance as a G5 at more or less the same clock speed. Imagine what Intel's research and fab abilities could do for that. They could have fabbed a processor like that 18 months ago when they started selling P4's on 130nm.

Note how diverse x86 processors are, from the high clocking P4 that everyone wishes would die, to the slower and wider Opteron, to the Pentium M that is both powerful and power efficient. People like to single out the P4, but it is only a single x86 design.

This is the wall that Intel is hitting.
That "wall" is based on numbers that pretty much came out of thin air. There is no problem with P4 performance scaling.

DrugsBunny
Jul 22, 2003, 02:02 AM
(notable exceptions being Itanium #1, Moto 745x, recent MIPS chips and recent Ultra Sparc chips)

Notice that the common word there is recent. New designs invariably have lower yields and run at lower clocks. Sticking them on a tried and true fab process is not the holy grail of chip production no matter who the company is.


You may have noticed that AMD's Opteron provides more or less the same performance as a G5 at more or less the same clock spee

This is just FUD. In most real world tests, current P4s and Xeons perform as well or better than the Opteron. There are no PPC970-based machines in the wild and there were no Opetrons in the G5-optimization lab at WWDC so no benchmarks exist to support your claim. Besides, performance != SPECint or SPECfp or SPEC anything for that matter.

They could have fabbed a processor like that 18 months ago when they started selling P4's on 130nm.
So why didn't they? For dramatic effect? And why would intel be fabbing a PPC970 for anyway? Fabrication isn't just a cool word and it's only one aspect (albeit, an important aspect) of the concept-to-product cycle of making a chip. Having a 130nm process doesn't mean you can throw arbitrary chip designs at it (especially ones you've never tried en masse before) and expect it to beat everything out there.

Note how diverse x86 processors are, from the high clocking P4 that everyone wishes would die, to the slower and wider Opteron, to the Pentium M that is both powerful and power efficient. People like to single out the P4, but it is only a single x86 design.
Yes, it's a wonderful smorgesboard. What's the point of this statement. And this is one area where I think Motorola had Intel soundly beat (power vs performance re: Pentium M).

There is no problem with P4 performance scaling.
There are problems scaling every processor to higher clocks. What's this claim based on? Does intel manufacturing process automagically remove clock-skew, signal propogation issues, power dissipation issues, packaging issues? And that's just the basics. No process or chip has "no problem" scaling. If they move to a new process, this will take fine-tuning to produce the yields and performance they have now. ad infinitum. The point to all this is that I maintain that it is impressive that the 970 has competitive performance given that it's a completely new design at a new plant, and for a PPC, a new process. My other point is, that end of x86 or not, intel has very strong competion from now on in IBM and the PPC9xx and will need to come up with something more impressive than the P4+speed increments.

sparkplug
Jul 22, 2003, 04:53 AM
Rubinstein, Akrout interviewed by DMN (http://www.digitalvideoediting.com/2003/06_jun/features/cw_macg5_interview.htm)


------a sample-----
DMN: Now, you're saying it's the first 64-bit desktop machine. But isn't there an Opteron dual-processor machine? It shipped on June 4th. BOXX Technologies shipped it. It has an Opteron 244 in it.

Rubinstein: Uh...

Akrout: It's not a desktop.

DMN: That's a desktop unit.

Akrout: It depends on what you call a desktop, now. These… From a full desktop per se, this is the first one. I don't know how you really distinguish the other one as a desktop.

DMN: Well, it's a dual processor desktop machine, just like that one.

Akrout: It's not 64, then.

DMN: Yes, it's a 64-bit machine with two Opteron chips in it. It started shipping June 4th.

Akrout: That we'll double check, but in my mind, it wasn't.


------then later on ------

DMN: You guys have probably watched the development of the AMD Opteron chip. How does this compare to the Opteron -- the way it works, the speed? What do you think are the differences and similarities between the two chips?

Akrout: They are both 64-bit, but as you know, the PowerPC is RISC architecture and they're more like, kind of CISC architecture. So there's that fundamental architecture difference. So there are some differences. You mentioned how they already have a desktop -- I'll have to double check that. I wasn't aware of that. What we've done here with the G5 -- it provides us with the first 64-bit architecture for the desktop.

DMN: Would you say that hertz-per-hertz, is it the same speed? Would a 1.8GHz Opteron 244 be comparable to a 1.8GHz G5 chip?

Rubinstein: I don't think we can really answer that. What we've done is, we've benchmarked against the fastest machines that are available out there today. And that's the 3GHz Pentium 4, and the dual 3.06 GHz Xeon. Those are machines you can go to places like Dell -- it's where we get them from -- and then we ran a variety of benchmarks. We took GCC 3.3 and we ran SPEC and SPEC Rate across both of them. We have run a variety of applications that run on both machines, and it's very clear that the G5 is the winner, hands-down.

----end----

Does anyone actually believe these people? Akrourt contradicts himself in his apparent knowledge that the g5 wasnt "first" re the 64bit thing.

Personally I believe its all spin and FUD untill I can run my apps on a G5 for myself and make my own comparisons. Before paying, not after.

Analog Kid
Jul 22, 2003, 06:17 AM
Originally posted by sparkplug

Does anyone actually believe these people? Akrourt contradicts himself in his apparent knowledge that the g5 wasnt "first" re the 64bit thing.

Personally I believe its all spin and FUD untill I can run my apps on a G5 for myself and make my own comparisons. Before paying, not after.

It sounds like he's just accepting what the interviewer is telling him-- that such a thing exists. Boxx isn't really a household name, and the Opteron is marketed as a workstation processor (as opposed to the desktop Athlon).

Yes, the difference between desktop and workstation is just a marketing difference. I don't consider an Ultra-Sparc to be a desktop, even though it's about the right form factor. Just raised that way I guess...

I could just as easily call the g5 a workstation or the xserve a desktop pizza box. It's a question of how it's used.

Just configured a Boxx and G5 to similar specs (dual 1.8GHz Opterons, dual 2GHz G5s, 2GB RAM, similar peripherals)-- the Boxx came up about $1000 more...

I don't know how fast the Opteron processors that I configured are... I'd have to poke around the AMD site.

If your point is that you have to double check marketing smoke, I won't disagree. If your point is that the G5 has been discredited because of ambiguity between desktops and workstations I will disagree, but whatever.

Quila
Jul 22, 2003, 07:09 AM
Originally posted by sparkplug
Akrout: They are both 64-bit, but as you know, the PowerPC is RISC architecture and they're more like, kind of CISC architecture. So there's that fundamental architecture difference. So there are some differences. You mentioned how they already have a desktop -- I'll have to double check that. I wasn't aware of that. What we've done here with the G5 -- it provides us with the first 64-bit architecture for the desktop.

People need to read more Ars Technica. Ars did a Athlon vs. G4 paper a while ago, and it turns out they are quite similar to each other; other than the instruction set, more similar than to the P4.

And for the person wanting to understand the processor speak, a good read of the rest of Hannibal's articles on processor and memory architecture will have him set -- no university necessary.

Aside from that, the "first 64-bit desktop" claims just depend on the definition of "desktop." I know people who have $10,000 UltraSPARCs in their offices who say "This is a 64-bit desktop."

Cubeboy
Jul 22, 2003, 07:33 AM
Originally posted by Quila
People need to read more Ars Technica. Ars did a Athlon vs. G4 paper a while ago, and it turns out they are quite similar to each other; other than the instruction set, more similar than to the P4.


I never did understand how in the world Hannibal reached that conclusion. To me, the design philosophy of Athlon, with it's triple FPUs, triple decoders, triple ALUs, or in other words, nine execution units, and large caches/buffers, seemed to focus on raw number crunching power. Whereas the design philosophy of the G4 seems to focus on efficiency. True, the Athlon is very close to a full fledged RISC chip but that doesn't mean that it's close to the G4. In fact, the design philosophies seem to be on opposite ends.

Cubeboy
Jul 22, 2003, 08:10 AM
Originally posted by Snowy_River
This is actually a very good point. I've read that Intel is running up against a wall, performance wise. That's not to say that they can't keep clocking their chips faster. Quite the contrary, they can, but they're reaching into the territory of deminishing returns.

I read an article that said that the x86 architecture at this point only gains about 2% processor performance for a 10% increase in clock speed. Now, doing some quick math, that means that to get a P4 that out-performs a 3GHz P4 by 10%, you'll need a 4.6GHz P4. Yes, that's right, a 60% increase in clock speed. This is the wall that Intel is hitting.

Now, the downside of this is that there will still be those who will say "I've got a 4.6GHz P4, and it can smoke your puny 2.5GHz G5." The MHz (or should that be GHz?) myth is still out there, and probably will be for some time. However, with entrants like the PPC970 on the field, things may start turning around a little faster than they have been.

Most of the diminishing return stuff comes from memory not scaling as fast as the cpu but with DDR2 and Quad Channel RDRAM coming out as well as others, I sincerely doubt we'll be seeing many problems from this.

So far, scaling seems to be pretty good actually, the 3.2 GHz Pentium 4 is typically around 5% faster than the 3 GHz Pentium 4 in real world benchmarks and SPEC which is very good considering the Pentium 4 is nearing the end of it's production life.

I would be VERY surprised if Prescott is not clock to clock superior to to the Pentium 4 considering all the modifications that are going into it.

Quila
Jul 22, 2003, 09:05 AM
Originally posted by Cubeboy
I never did understand how in the world Hannibal reached that conclusion.

Basically, as opposed to the P4's brute force, long and narrow, crank it up to high-speed architecture.

The other two are really more elegant.

ddtlm
Jul 22, 2003, 10:51 AM
DrugsBunny:

Notice that the common word there is recent. New designs invariably have lower yields and run at lower clocks.
You pulled this excuse out of thin air. Every single one of the chips I listed was an inferior design, which was the main problem, not fab tech.

This is just FUD.
Do you know what FUD means?

In most real world tests, current P4s and Xeons perform as well or better than the Opteron.
As well or better, yep. The recent single-CPU Operton tests I've seen (after the first round of AMD Zone tests, which were pulled pending a newer mobo) show the Opteron at 1.8ghz often loosing to the 3.2ghz P4 and sometimes winning, but overall quite compeditive.

There are no PPC970-based machines in the wild and there were no Opetrons in the G5-optimization lab at WWDC so no benchmarks exist to support your claim.
There is plenty of basis for my "more or less" claim. The Opteron is more-or-less compeditive with the top Intel x86 chips which are more-or-less compeditive with the G5.

Besides, performance != SPECint or SPECfp or SPEC anything for that matter.
It is the only cross-platform benchmark available to us, and it does include a lot of subtests based on real programs. SPEC does support my "more-or-less" claim, you may have noticed.

So why didn't they? For dramatic effect? And why would intel be fabbing a PPC970 for anyway?
I meant the Opteron, but thats not important. The point is that Intel could have made an Opteron-like processor instead of a P4, and that they could have fabbed the darn thing 18 months ago. They don't do this because they don't want to kill Itanium, the same reason they are playing stupid about 64 bits on the desktop.

Fabrication isn't just a cool word and it's only one aspect (albeit, an important aspect) of the concept-to-product cycle of making a chip.
Huh? Of course they can't just slap a chip togeter a fab it instantly, but seeing as how they've already fabbed four processor designs on 130nm (P4, PM, P3, and Itanium 2.5), there is no reason wonder if they could have fabbed a different chip.

Having a 130nm process doesn't mean you can throw arbitrary chip designs at it (especially ones you've never tried en masse before) and expect it to beat everything out there.
I don't know why you are trying to make this sound so hard, but AMD has a very small R&D budget and very little fab capacity compared to Intel and you can assume anything AMD has done Intel could do. The fact that Opteron exists and performs as well as it does serves to show how "easy" it is to make powferful-per-clock x86 chips.

Yes, it's a wonderful smorgesboard. What's the point of this statement.
To address the "the x86 architecture at this point only gains about 2% processor performance for a 10% increase in clock speed" commentary from Snowy_River. Notice how establishing the wide variety of x86 designs which scale and behave differently nicely deals with it.

And this is one area where I think Motorola had Intel soundly beat (power vs performance re: Pentium M).
Soundly beat, eh? Do you believe this based on faith, or can you substantiate you claims?

There are problems scaling every processor to higher clocks. What's this claim based on? Does intel manufacturing process automagically remove clock-skew, signal propogation issues, power dissipation issues, packaging issues? And that's just the basics. No process or chip has "no problem" scaling.
Intel has "no problem" scaling the P4 in the same way I have "no problem" walking accross the room. Both a very complex operations when examined it detail, neither is difficult for the orgamism doing them. Thowing out semi-technobabble really fails to prove anything and really doesn't help your arguement.

The point to all this is that I maintain that it is impressive that the 970 has competitive performance given that it's a completely new design at a new plant, and for a PPC, a new process.
AFAIK the Power4+ and 750fx were fabbed on the same process. But anyway, I still fail to see how the G5's being more-or-less compeditive is impressive. Like I said, even the original P4 was more-or-less compeditive.

My other point is, that end of x86 or not, intel has very strong competion from now on in IBM and the PPC9xx and will need to come up with something more impressive than the P4+speed increments.
Of course eventually the P4 will be replaced, but the upcoming P4/P5 design should do fine against the PPC chips it will face. Even now, before the G5 has shipped, its not clear that the G5 is faster. Both the P4 and the Xeon have been updated since Apple's benchmarking was done. Ponder what the situation will be in 6 months when Apple has probably announced but not shipped G5's in the ballpark of 2.5ghz.

Cubeboy
Jul 22, 2003, 03:59 PM
Originally posted by Quila
Basically, as opposed to the P4's brute force, long and narrow, crank it up to high-speed architecture.

The other two are really more elegant.

I can understand the "long and narrow" part but the brute force part just seems unexplainable. You have to understand, clock speed isn't the only measure of "brute force", if we're talking about raw cpu power, IPC also plays a role. In this case, the Athlon is going to have a lot of "brute power", more than the Pentium 4 in many instances.

The reason it isn't particularly "efficient" has much to do with it's higher latency L1 and L2 caches and the very nature of x86 code (low ILP, lots of Loads and Stores). In reality, their are very few instances where the Athlon can use the full power of it's execution units, worse, it's high latencies can cause significant performance hits if a cache miss occurs, especially the L2 cache.

DrugsBunny
Jul 22, 2003, 10:26 PM
Every single one of the chips I listed was an inferior design, which was the main problem, not fab tech.

Inferior in what sense? Poorly written HDL? Because ars technica said so? A forum on AOL said so? And unless you work any of the companies concerned \ and/or are a semiconductor engineer, how do you know what the main problems were? Baseless claims indeed.

There is plenty of basis for my "more or less" claim. The Opteron is more-or-less compeditive with the top Intel x86 chips which are more-or-less compeditive with the G5.
This is based around the SPEC results released by Apple running gcc. Do you even know what an FPGA is??? Check what "real world" programs are included in SPECint and then make a list of how many you actually use (heard of?) and divide that by 12 to get a rough idea of how relevant these tests are. I use maybe 2 on a regular basis. The SPECfp tests are even more irrelevant.

It is the only cross-platform benchmark available to us, and it does include a lot of subtests based on real programs.
See above in case it wasn't obvious that they mean nothing. Grab the 2 machines in question, software that will be used in each and run some typical tasks. That's a fair and genuinely useful comparison. Benchmark suites have always meant nothing and will continue to do so while we all run different programs and OSs with different RAM, HD and graphics cards.

Soundly beat, eh? Do you believe this based on faith, or can you substantiate you claims?
PPC7455 1Ghz- 19.9W
1.7GHz Pentium 4-M 35W (20.58W/Ghz)
This is comparison on clock speed alone and does not take into account the processors' relative performance.


Intel has "no problem" scaling the P4 in the same way I have "no problem" walking accross the room. Both a very complex operations when examined it detail, neither is difficult for the orgamism doing them. Thowing out semi-technobabble really fails to prove anything and really doesn't help your arguement.

How do you support your claim of intel's limitless capabilities of scaling the P4? And yet somewhere below, you inform us that they'll eventually replace the P4. This is a discussion/debate on architecture, performance, design and fabrication/manufacturing of the chips concerned. Like every field, semiconductor engineering has its jargon to enable quick and effective communication with the engineering community. Intel' engineers have studied at the same places as IBM, mot, AMD etc and know all the same tricks and pursue much of the same research to overcome the problems you dismiss as technobabble. Rest assurred that if intel had come up with a way to scale clock speeds from the same core design in the fashion you describe, the other guys can't be far behind.

But anyway, I still fail to see how the G5's being more-or-less compeditive is impressive.
This has already been addressed. I apologize if you are unable to see that.

Sun Baked
Jul 22, 2003, 11:16 PM
Originally posted by DrugsBunny
Inferior in what sense? Poorly written HDL? Because ars technica said so? A forum on AOL said so? And unless you work any of the companies concerned \ and/or are a semiconductor engineer, how do you know what the main problems were? Baseless claims indeed. There are engineers reading and following the ARS Technica forum, and the articles being written by the members. (Hannibal and the gang)

IBM's asked to use some of the material that came out of the discussions, and it's even shown up in Apple's G5 marketing material.

So I can't see where the gang in the GPUL threads don't know what they are talking about.

They've been hard on GCC compilers for the last year (well mainly one member), and when everybody saw the benchmark tests it sort of exploded.

They may not have all the info, but they've done a decent job with the little info out there that did show up.

ddtlm
Jul 23, 2003, 04:31 AM
DrugsBunny:

Inferior in what sense?
That you even ask this question labels you in ways you probably wouldn't like. You might review that list, I included some blatant and semi-famous crappy chips on it including the Itanium #1 and UltraSparcs (by which I mean #2 and #3). Actually I'm not sure how the performance of the US2 was at launch, but by the time they were replaced it was pretty bad.

Check what "real world" programs are included in SPECint and then make a list of how many you actually use (heard of?) and divide that by 12 to get a rough idea of how relevant these tests are. I use maybe 2 on a regular basis. The SPECfp tests are even more irrelevant.
I use 3 at least, all integer. (Those being gcc, gzip and bzip2.) But thats beside the point. Niether you nor I are in a position to declare SPEC as irrelavent because we don't use the code in the other tests. A lot of the SPECfp stuff is science and engineering stuff, which would be perfectly relevent if I were either of those. So to summarise that, you look pretty silly brushing of SPEC because you don't use the subtests. Perhaps you don't care about what SPEC tests, but that doesn't make it non-representative of overall performance for the broader public.

Benchmark suites have always meant nothing and will continue to do so while we all run different programs and OSs with different RAM, HD and graphics cards.
Well perhaps you could make a valid arguement that benchmarks of programs you'll never use are not relevant to you, but I find odd your claim that our differing hardware makes benchmark suites irrelevent. Last time I checked most benchmark suites were designed to evaluate hardware.

PPC7455 1Ghz- 19.9W
1.7GHz Pentium 4-M 35W (20.58W/Ghz)
Not only does your example fail to show and "sound beating" but it takes a maximum clocked P-M against a medium clocked G4. The way you keep throwing in extraneous technobabble, you should find it particularly embarrassing that I have to point out that energy usage scales faster than clock speed, so I could pick a 1.42ghz G4 and give the G4 unfavorable watt/ghz numbers, or pick a genuine 1ghz P-M and get a similar effect. Worse still, since performance doesn't scale as fast as clock speed, the watts/performance of a 1.42ghz G4 would look even worse compared to a 1ghz P-M if I chose to compare them (seems as fair as the compaison you chose). Sadly I do not have any numbers to establish watt/performance, and even if I did we'd just be back talking about benchmarks.

How do you support your claim of intel's limitless capabilities of scaling the P4?
As you saw yourself I don't think its anywhere near "limitless", so why even say this? Anyway, I base my commentary on how easy it has been for them to release higher clocked parts whenever they want, and how easy people are able to overclock the chips. Heck, its so easy for them that they make the Celeron on a P4 core and clock it at what, 2.6ghz? Thats a cheap processor too. They'd rather sell those at 2.6ghz than standard P4's with more cache and more-or-less the same performance at much lower clock speeds.

Intel' engineers have studied at the same places as IBM, mot, AMD etc and know all the same tricks and pursue much of the same research to overcome the problems you dismiss as technobabble.
I call it semi-tecknobabble because you are throwing it out where it doesn't need to be, as if you think lots of big words will cause me to agree with you. Like I said, it doesn't help your argument at all, and in light of your mistakes it doesn't do much for your credability either.

DrugsBunny
Jul 23, 2003, 07:28 AM
That you even ask this question labels you in ways you probably wouldn't like.
Name-calling?

A lot of the SPECfp stuff is science and engineering stuff, which would be perfectly relevent if I were either of those. So to summarise that, you look pretty silly brushing of SPEC because you don't use the subtests. Perhaps you don't care about what SPEC tests, but that doesn't make it non-representative of overall performance for the broader public.
I suppose the broader public consists of scientists and engineers? I refer you H & P, Computer Architecture : A quantitative approach on the common view of benchmarks in the industry or ask one of the engineers on comp.arch.



Last time I checked most benchmark suites were designed to evaluate hardware.
You miss the point. If apple publishes spec scores using a "standard" configuration, that doesn't say how the system might perform using a configuration I might want to buy and so are largely useless in evaluating machines for purchase.

Not only does your example fail to show and "sound beating" but it takes a maximum clocked P-M against a medium clocked G4.
The comparison is fair as both machines represent the processors used in top-of-the-line portables.

...energy usage scales faster than clock speed...
This doesn't even make sense. Power usage increases with the square of the clock speed if that's what you mean but when the clock speeds are like these, the constant of proportionality is more important. The more important feature of the comparison is that they have similar performance on benchmarks and real-world tests and there's a 15W difference is power dissipation.

Chryx
Jul 23, 2003, 11:45 AM
Originally posted by DrugsBunny

PPC7455 1Ghz- 19.9W
1.7GHz Pentium 4-M 35W (20.58W/Ghz)
This is comparison on clock speed alone and does not take into account the processors' relative performance.


The Pentium 4-M and the Pentium M are not the same chip, the Pentium M has the best watts/overall performance I'm aware of right now, and soundly beats out the 7455A

a quick google has failed to turn up the exact specs.. but it's into 750FX territory as far as power consumption goes IIRC

ddtlm
Jul 23, 2003, 01:21 PM
DrugsBunny:

Name-calling?
I did what I could to keep my critisism mild. Really, with all that techspeak you should know that those were some weak designs.

I suppose the broader public consists of scientists and engineers?
A heck of a lot broader than you or I! Or even us combined. On a SPEC note, I just remembered that SPEC uncludes Perl tests too, so I use at least 4 of their integer tests.

I refer you H & P, Computer Architecture : A quantitative approach on the common view of benchmarks in the industry or ask one of the engineers on comp.arch.
Rather than name dropping, why don't you tell me what I should find here and how its going to support you. I'm sure not going to go read a book or visit IRC to find out what you intend me to discover.

You miss the point. If apple publishes spec scores using a "standard" configuration, that doesn't say how the system might perform using a configuration I might want to buy and so are largely useless in evaluating machines for purchase.
So you'd rather "run some standard tasks" on every possible configuration of hardware? Next time someone asks what "standard tasks" to evaluate a machine by, I'll make sure to refer them to you.

The comparison is fair as both machines represent the processors used in top-of-the-line portables.
Hmmm, as Chryx just pointed out it seems that I overlooked that you were referring to the Pentium 4 M in your comparison. Strange, seeing as how its performance is far inferior to the Pentium M at 1.7ghz and also inferior to the Pentium 4 M's that are available at clock speeds clear to 2.6ghz (though I've never seen one higher than 2.4ghz). But the important thing here is really that you think it is fair to compare top-end portable processors based on power usuage. This approach fails to recognize that processor speeds are determined by consumer demand, not some rule of ideal watts/performance.

Power usage increases with the square of the clock speed if that's what you mean but when the clock speeds are like these, the constant of proportionality is more important.
Power usage scales as a square of voltage, and voltage tends to increase with clock speed, but energy use does not scale as a square of clock speed itself. See how much more concise my original statement about "power use scaling faster than clock speed" is? Technobabble is bad.

The more important feature of the comparison is that they have similar performance on benchmarks and real-world tests and there's a 15W difference is power dissipation.
Its time to look at the power usages of Pentium M's, not Pentium 4-M's. Intel has a pdf available here: http://www.intel.com/design/mobile/datashts/252612.htm?iid=ipp_dlc_procpmp+info_datasheet& that on page 72 lists the thermal design power, which is not quite the same as the numbers from Moto because, as you may know, the TDP allows for the power saving trickery of the P-M to head off occasional overheating (which is also discussed in the document). Anyway, at 1ghz the TDP of a Pentium M is only 7 watts, less than 1/3 what the 1.7ghz chip is rated for, less than a 1ghz 7455, and less than even a 1ghz 7457. I pretty much expect that you'll protest the use of TDP, but it is a realistic measure of power use because power use is simply capped by throttling performance. Of course this opens up a whole new avenue of squirmage to you, where you can cast doubt on how fast the chips are before they throttle, but there are real PC vs PC benchamrks out there where the Pentium M performs quite well (see Anandtech, Tom's). Based on my reading, the Pentium M performs somewhere like an Athlon clock-for-clock, which leads me to expect that the P-M as fast a 1ghz 7455 is clocked much closer to 1ghz than 1.7ghz. That would mean the TDP is much closer to 7 watts than 25 watts. I think I'd be willing to say the 7457 probably has better watt/performance at 1ghz, but of course that needs to be reevaluated at different clock speeds because voltage levels are so critical to power usage. Anyway, to the original point that started this whole sub-discussion, there is certainly no "sound beating" of the P-M by the G4.

vrapan
Jul 23, 2003, 02:21 PM
I am not an engineer or a circuits designer but I read somewhere that P4 hits the wall for the following reason. The very large number of chips on such a small piece of silicon have a raised a problem: Electicity escapes because the walls inside the transistors are so thin that cannot completely insulate electricity. This means that larger and larger loads of power are getting lost as the number of transistors or the clock speed increases making the chip extremely power hungry. Moore himself said that with the current design P4 cannot go much higher at clock speed and definitely it cannot keep doubling its speed every 18 months. If i am talking bull shut me up please :-)

ddtlm
Jul 23, 2003, 02:49 PM
vrapan:

I think what you have said is essentially correct, except that all processors will increasingly face that problem as they are produced with smaller and smaller features. At 130nm I gather it wasn't too bad, but its looking worse for 90nm. There has been some news recently that the 90nm P4/P5 sucks a lot more power than originally planned.