Getting really off topic...
I know you weren't replying to me, but I am reading your comment at the end to imply that I am responsible for "some of the crap that is being posted today".
Originally posted by Cubeboy
Did I ever say "special tweaks"? NO, you just assumed I did and made a circular argument out of it right?
Actually, A different poster referred to "special tweaks". If you look at the previous postings you will note the exact term he used was that GCC has "a ton of special tweaks written by chip vendor IBM." (a pretty damning quote!) I disputed that, you jumped on me, a benchmark war ensued, the guy you're flaming defended my statements and got confused as to who said what.
This mistake is understandable as you took what I said out of context when flaming me.
My position on the machine description file has already been set. The original arguments I've seen took IBM to task on an incorrect machine description file because it was believed that changes in the file were deliberate FUD on behalf of Apple, it had nothing to do IBMs right, as a chip vendor, to modify a MD file to improve performance. This was later misinterpreted when the "oft-cited, much-maligned" Veritest benchmark war. It turned out to be that the file needed to be tweaked because gcc's CPU model is not a good model for the 970. IBMs explanation for the late change sounds very reasonable--their explanation was roughly equivalent to "we were tweaking it because a MD file isn't as easy as looking at a spec sheet as these people would have you believe."
(You and I both know that there is a world of difference between tweaking a machine description file and tweaking the GCC compiler's CPU model itself. We both know that the P4's MD file has far more commits than the 970's.)
I ignored this stuff because it is true and I've never disputed this. We don't know why an optimal scheduler isn't available for the P4 in GCC. There is no need to shout about this and drive an honest discussion into the uncivil.
Even if such a thing is submitted we don't know what the improvement will be. I'm inclined to believe that there are a number of factors in addition to the lack of documentation coming out of Intel. Remember, the P4 of today is a completely different beast inside than the earlier models (hyperthreading, etc): these might not map well to the GCC CPU model either.
The difference is that IBM is committed to convincing GCC to improve the model and Intel isn't. This may mean that
in the future GCC may become biased toward PPC, but that doesn't refute the present reality that GCC is a better compiler for x86 than PPC. (I'm on record, many times, as believing that using GCC to "normalize out" the compiler for SPEC is misguided: SPEC is a system benchmark that tests CPU, memory, and compiler (and to little extent: bus, OS and other components)). It always bothered me that the average computer user has even heard of it when making a purchase.
Okay this just flat out wrong, almost all the time, gcc is tuned by people from the CPU vendor for a particular CPU, Intel does NOT do this, they would rather spend their time tuning ICC and for good reason.
Actually, his statement is correct. The amount of time spent tuning that Apple/IBM have done for the 970 does "pale" in comparison to the amount of tuning done on for the x86. As evidence, note how good the performance is of gcc3.3 vs. ICC for the P3. In the PowerPC/POWER world, gcc has not achieved close to parity with CodeWarrior or IntelliAge and there are a lot less developers working on and with gcc for the PowerPC. Again, this will change because IBM's stated commitment to open-source and Apple's obvious dependance on gcc as the only Objective-C compiler around (weighed against the GCC team's biases against accepting any changes which affect portability).
I'd even bet that more time was spent tuning for the P4 specifically than the 970 in gcc. The P4 has been out for a while and is the probably the second most used CPU (after the P3) for Linux. The problem here is a noticeable lack of documentation on how to go about doing such tuning. The trick of having hyperthreaded double the apparent number CPUs available to the kernel alone must have taken a good bit of time. It was a hack that has since been fixed, but that doesn't mean it didn't take a lot of time.
Looking over the documentation, on ICC/IFC 5.0, SSE2 only improved performance 5% over x87 only code in SPECfp.
Whoa, you (and AMDZone) are misreading your own cites. First, the 5% performance gain is specifically due to instructions added when they jumped between MMX/MMX2 (SIMD in Pentium II and III) and SSE/SSE2 (SIMD in Pentium IV). Second, the fact that there is a gain
at all points to autovectorization being done. (Now I agree with you that I've been guilty of referring to autovectorization when I generically mean autovectorization, autoparallelization, and other CPU modelling optimzations.) Third, this performance gain will increase in later versions of ICC as the Intel folks figure out more places the new instructions create benefits.
My god, have your EVER seen how a Pentium 4 performs on ICC and GCC, I suppose not, you pulled this little statement out of thin air just like the rest of your "FUD".
Those are more examples that reinforce the Pixar statement (50% speed gain on P4 with ICC vs. GCC). The optimizer in ICC is really good (by those statements) and really mature (by the fact that Intel's compiler shows better results than GCC with
even AMD's chips). I should note for the others not willing to sift through all your cites that there are a couple tests where GCC benchmarked in rough parity or better than ICC.
I never claimed that the ICC optimizations only benefit SPEC (others may have). My guess is the biggest gains are not AV or AP at all but are the use of a lookup table for trigonometric functions in ICC. LibMoto (A math library Motorola made for the PowerPC) used to do the same thing and would pump up the FP marks in old Mac benchmarks by 80%, but because of incompleteness of the tables, it would affect the stability of some video games which depended on the accuracy of the numbers. I find it doubtful that the GCC team would accept such changes even if they were offered.
Some of your cites actually reinforce a significant speed gain with ICC vs. GCC regarding P4 SPEC2000. That does reinforce my statement that 1) You cannot "normalize out" the compiler in SPEC benchmarking as Apple claims and 2) it is misleading to report Apple's SPEC numbers side-by-side with standard SPEC2000 benchmarks.