Originally posted by ddtlm
macrumors12345:
I don't think the 1.7ghz Power4+ compares well performance-wise core-vs-core to the 1.6ghz Itanium2
There is no such thing as a 1.6 Ghz Itanium 2, so I am sure that a Power4+ compares very well to it since it scores 0 on all benchmarks!
On the single-threaded SPEC benchmarks, the 1.5 Ghz Itanium 2 scores either 1075 or 1300 on SPECint2000 (depending on whether you want to use SGI's, Dell's, or HP's numbers) and between 1875 and 2100 on SPECfp2000. The 1.7 Ghz Power4+ scores 1100 in SPECint2000 and 1700 in SPECfp2000. So a single Power4+ core is about 5% slower in integer and 15% slower in floating point than an Itanium core, according to SPEC, although you should note that the Itanic scores appear to reflect the publicized "179.art cheat" that Sun found, whereas the Power4+ scores appear to be playing it straight. Nevertheless, the Itanic is slightly faster in single threaded code, but the key word there is slightly. I'm sure that the two cores will trade off leads in single threaded benchmarks with each new revision (note that Itanic was revised more recently than Power), as they have been for the past couple years. I would personally consider both cores to be pretty competitive with each other on a strictly core-to-core comparison, and I will say the same thing when Power is slightly ahead of Itanic, but perhaps you consider 10% to be a much larger gap than I do.
The important question, however, is whether these chips are used to run just one thread. The answer is no. These are first and foremost server chips, and nobody is going to run a server with just one thread active. The chips also see some use in workstations. There it is possible that you could care only about single threaded performance, but still unlikely, as evidenced by the fact that over 80% of HP's Madison workstation configurations come with dual processors (see
http://www.hp.com/workstations/itanium/zx6000/reseller.html). So, in summary, SPEC CPU2000 is clearly *not* the benchmark you want to use to evaluate these chips.
and I don't think dual-core or die space considerations matter.
Wow, then you had better tell IBM, Sun, Intel, and HP this, because their long term plans are all heavily focused on producing multicore chips!
It may cost Intel a lot more to fab two Itanium2's than it costs IBM to fab a Power4+, but I don't think Intel cares at this point (still in the massive bleeding of cash stage).
If Intel doesn't care about costs, then why don't they just double the L3 cache from 6 MB to 12 MB? I'm sure it's technologically possible, and it would undoubtedly crank those SPECfp scores up a bit higher. Sure, it might cost more, but costs are irrelevant, right?
Look, in the end it is all about trade-offs. If IBM wanted to, they could replace the second core on Power4+ with more cache, and then the single threaded SPECfp2000 scores would shoot up even higher (that benchmark, as you probably know, is highly dependent on bandwidth). But they (and by implication their customers) feel that those transistors are better used on a second core, because they don't want to just run one thread at a time. Implicitly Intel/HP are also thinking this, since they are trying pretty hard to get a dual-core Itanic chip out the door sooner rather than later.
There is also a subtle point that many people miss when comparing single threaded SPEC scores between Itanic, which is a VLIW processor, and a more "conventional" RISC processor like Power4. Specifically, Itanium tries to wring everything it can out of instruction level parallelism, whereas Power4 focuses on getting performance from thread level parallelism. But since SPEC only runs one thread, it is essentially the case that Itanium is allowed to exploit its parallelism in this benchmark whereas the Power4 is restricted from exploiting its parallelism (since the benchmark only runs as a single thread). A better way to look at it is from the standpoint of the problem. To the extent that a task is inherently non-parallizeable, neither Power4 nor Itanium is probably going to do too well at it. But to the extent that the task is parallizeable, then Itanium can use its ILP advantage in SPEC, but Power4 is not allowed to use its thread level advantage in SPEC. In reality, of course, to the extent that the problem is parallelizable, any decent programmer would be trying to utilize both ILP and thread level parallelism.