Short Pipeline is better. Long pipeline is better.

acj · Aug 2, 2003

How do you know what to believe, and when?

The G4's 4, and later 7 stage pipeline was always advertised as a strong point, compared to the pentium 4, but now the G5 has a 16 stage integer pipeline and a 21 state FP pipeline, and up to 25 stages for the Velocity Engine.

Now they mention over 200 simultanious instructions compared to P4's 126, but they never mentioned this with the G4's measly 16. Obviously the P4 wasn't over 7 times faster than the G4.

It really seems that nothing really matters and if speed is all that's important to you, you need to compare computers side by side with what YOU do and see what's best for yourself.

iJon · Aug 2, 2003

ah come on, we all know this. the whole mhz myth was just supposed to be music to customers ears who wanted ac omputer for speed when apple was getting their butts kicked by intel and amd.

iJon

Powerbook G5 · Aug 2, 2003

Exactly, all you need to know is that the G5 is one higher than the P4, why would you want a mere 4 when you can have a 5?

bousozoku · Aug 2, 2003

When the 604 and the P2 went head-to-head, the 604 was better by quite a margin. When the P3 arrived, the margin narrowed for the 604e.

The G4 is just a poorly designed economy processor with an above average SIMD unit.

A short pipeline is better when the processor doesn't guess correctly because it has to unload everything and start over. If it guesses correctly, obviously a long pipeline is going to help because all the instructions/data are available and can keep the processor going at full pace.

simX · Aug 2, 2003

Re: Short Pipeline is better. Long pipeline is better.

Originally posted by acj
How do you know what to believe, and when?

The G4's 4, and later 7 stage pipeline was always advertised as a strong point, compared to the pentium 4, but now the G5 has a 16 stage integer pipeline and a 21 state FP pipeline, and up to 25 stages for the Velocity Engine.

Now they mention over 200 simultanious instructions compared to P4's 126, but they never mentioned this with the G4's measly 16. Obviously the P4 wasn't over 7 times faster than the G4.

It really seems that nothing really matters and if speed is all that's important to you, you need to compare computers side by side with what YOU do and see what's best for yourself.

It all really depends, and your last statement is pretty much right on target.

From my (very limited) understanding, longer pipelines allow you to do more instructions per clock cycle, but they are a drawback when "bubbles" appear in the pipeline. "Bubbles" are like when a certain instruction requires the results of another instruction, so it has to wait for that other instruction to finish... so a long pipeline means that the bubble has a bigger effect on the efficiency of that clock cycle.

But like you said, these drawbacks can be overcome by other design considerations, so it's best just to compare real-world performance in applications you use.

MrMacMan · Aug 2, 2003

Yes, this is why shorter or longer pipeline can't be calculated properly.

Powerbook G5 · Aug 2, 2003

I wonder how the branch prediction on the G5 will affect it's longer pipeline, I know Steve and the IBM guy both said it predicts correctly a good 90% of the time or so, but for that 1 in 10, that error is going to be felt more on a longer pipeline, isn't it?

Catfish_Man · Aug 2, 2003

Pipeline depth IS important, but it's not the final word in performance. When balanced out by a good cache, good out of order execution (oooe), and good branch prediction (like a P4 or G5), lengthening the pipeline is an effective way of increasing performance (by raising the clock frequency). In a processor like the G4, with little to no oooe (because embedded programs tend to be hand scheduled, and oooe increases power consumption), and only mediocre branch prediction (because of the short pipeline), lengthening the pipeline would probably have been a bad idea. It would have DEFINITELY been a bad idea for its target market (high end embedded), which is notoriously latency and power conscious. The G4 and G4+ served their intended purpose quite well, although the massively delayed transition to .13 micron, and the lack of an on chip memory controller are beginning to hamper them. The fact that they made pretty decent P3 killers was (mostly) just an added bonus.
The G5, on the other hand, seems squarely targetted at Xeons and Opterons (and to a lesser extent P4/AthlonXP/Athlon64), which is just about perfect for Apple's purposes. It's designed in a fairly similar way to them, in certain respects. It has a long pipeline, with extensive oooe, and large caches. This allows it to scale much higher than the G4, and makes it better suited to running poorly optimized code (which is what most code is), The downside is that it has significantly higher power consumption and manufacturing cost than a G4 made on the same manufacturing process.

Fender2112 · Aug 3, 2003

One analogy that stuck with me described the the 970 like this: The P4 has long and narrow pipeline. The G4 has a short and wide pipeline. By comparison, the G5 (970) has long and wide pipeline. I don't know what this means in terms of stages or in flight instructions. This description gave me a mental image that makes the G5 seem like the best of both designs.

acj · Aug 3, 2003

Fender:

I think that's fairly accurate. Time will tell. Most of us haven't actually used a G5.

Catfish_Man · Aug 3, 2003

Originally posted by Fender2112
One analogy that stuck with me described the the 970 like this: The P4 has long and narrow pipeline. The G4 has a short and wide pipeline. By comparison, the G5 (970) has long and wide pipeline. I don't know what this means in terms of stages or in flight instructions. This description gave me a mental image that makes the G5 seem like the best of both designs.

This is true, but a number of compromises elsewhere in the design were made to achieve this. Tracking the execution of 200+ instructions would be prohibitively difficult, so they divided them into groups of 5 and tracked 40 groups instead. This allows them to keep the complexity down to a manageable level, but adds a number of restrictions to how instructions can be dispatched and retired. Overall, I think this was a good tradeoff (2 integer, 2 floating point, 2 load store, and 4 vector, with a long pipeline, is very impressive), but it's going to be a bitch for the compiler writers.

Mav451 · Aug 3, 2003

hey fender: so what kind of pipeline does the AthlonXP and Opteron have?

I'm just wondering since i went to that Ars Technica site and didn't understand a single word of what they said

Fender2112 · Aug 3, 2003

Originally posted by Mav451
hey fender: so what kind of pipeline does the AthlonXP and Opteron have?

I'm just wondering since i went to that Ars Technica site and didn't understand a single word of what they said

Those are clogged pipelines. You want to barrow some of my Draino?

Seriouly though, I don't know. This was analogy that help explain the difference between PPC and x86. That Ars Technica artical is bit above my head, but I did follow the gist of it.

Search

Search

Short Pipeline is better. Long pipeline is better.

acj

macrumors 6502

iJon

macrumors 604

Powerbook G5

macrumors 68040

bousozoku

Moderator emeritus

simX

macrumors 6502a

MrMacMan

macrumors 604

Powerbook G5

macrumors 68040

Catfish_Man

macrumors 68030

Fender2112

macrumors 65816

acj

macrumors 6502

Catfish_Man

macrumors 68030

Mav451

macrumors 68000

Fender2112

macrumors 65816

Our Staff