Summary of my posts on SlashDot.
Reformatted for MacRumors.
The first is the G5 Terascale cluster at Virginia Tech at #3 (10.28 Tflops/s, 2200 CPU, Infiniband) is the first academic computer to break 10 teraflops/s. This extra performance
was promised at Mac OS X Developer's conference last month. Not to sure if the price is a testament to Infiniband ($1.5 million cabling, cards, and routers) or the Macs ($4.2 million list price*).
Good thing too because in a surprise move
the NCSA cluster made the list at #4 (9.82 Tflops/s, 2500 CPU, Myrinet) and might have beat Terascale's previously reported 9.555 Tflops/s. This cluster is built using Dell's running Pentium 4 XEONs and Red Hat Linux. One subtle point to note is that they didn't get all the systems online in time (there should be 2900 CPUs, not 2500). I bet
the PSC who coded déjà vu and
an ex-Chief Scientist of SDSC are appreciating having a hand in edging out their arch-rival NCSA for #3--not to mention Apple beating Dell.
The fastest Itanium cluster is at #5 (8.63 TFlops/s, 1936 CPU, Quadrics) which is looking like the odd man out boxed in by a PC based systems using
Linux OS, Myrinet networking--the P4 Xeon above, and the most powerful Opteron system at #6 below (8.05 Tflops/s, 2816 CPU, Myrinet).
And finally, It's easy to overlook #73, a single compute node of BlueGene/L (1.44 Tflops/s, 1024 CPU).
Imagine 128 of these connected together and you have something that will easily take #1 when it's completed even if we handicap it 20-40%.
As noted on SlashDot earlier, this is also on Linux. But Mac users should note that its CPU
is based on the PowerPC architecture.
Note that this Mac cluster is no longer 10x cheaper than it's peers since #4-6 were built using the Pentium 4 Xeon, Itanium, and Opteron respectively. According to Virginia Tech which priced the latter two, those systems cost for these would be around $9-10 million--twice as much. Not absolutely sure how much the P4 XEON system cost, but given the number of CPUs used, I'd say it's not price-competitive with the Mac. For instance,
you can go to Dell's website and price the same 1250 machines for $6.7-8.4 million (the upper price includes Red Hat and 1MB cache). Where is the vaunted value in IA32 now? Therefore, of the four, the G5 (970) offers the highest flops/cpu, the second most flops/cycle (Rpeak is the same/cycle as Itanium but it's not as efficient so it's Rmax gets edged out), and untouchable price/performance. (You also get a DVD burner, a good video card, optical in/out, firewire, and a historically high resale value when you want to upgrade your systems).
As noted on Mac Rumors earlier, IBM will be introducing
G5-based blades in the same vein as their
P4 Xeon ones. Since IBM supports the Opteron and Itanium, I hope they're soon to follow. Imagine choosing any combination of 4 different CPU families in your blade center. How's that for business flexibility! (Now if Apple licenses Mac OS X Server to IBM, my dream comes true.)
One person claimed that Apple did a lot of assembly language hacking. Hardly! I'm sure the Apple sales people moved mountains to get those systems shipped on time not to mention the work done making drivers for the Infiniband cards, but the coding was mostly
one man working two months getting some software ported and various other libraries from people such as Professor Goto. I'm sure the other systems had as much, probably more, hacking done.
The Opteron is a horrible performer in this benchmark. This is one to rub in the face of anyone who blabs about Opteron, but please do it right. Simply put, the Rpeak of the 2Ghz G5 is twice that of the 2Ghz Opteron at 4Gflops/s vs 2Gflops/s. Note though that the Opteron may suffer less drop-off from the peak, but it's not going to be enough to make up for that factor of two. Certainly the Itanium 2 does and it has the same flops/Mhz and has less dropoff as the G5. The problem is, it peaks out at 1.5 Ghz. What then when faster 970's launch early next year and again see a big bump do to going 90nm?
BTW, Altivec/VMX/Velocity doesn't get involved here because it can't do double precision mult-adds. Whenever there is a something optimized for the Altivec, it's impressive (a Slashdotter mentioned the P4's performance at
distributed.net and had it thrown back in his face when
stats showed G5s outperformed them by a factor of 3 or more due to these programs being ideal cases for Altivec optimiation). Note that LINPACK (benchmark used by the Top500) is very dependant on network speed, which is why I keep mentioning the network when I mention the stats above.
If you can use Altivec, the G5 usually beats the Opteron, which beats the G5 for non-vectorizeable integer performance. I still think the Opteron is probably the best web server chip around and perhaps the best price/performance database chip, if that means anything.
Finally, heat isn't an issue. First, the 2Ghz G5 generates 47 watts of heat which is half that of the P4 @ 3Ghz. Second, IBM introducing G5 blades shows that heat isn't an issue. The problem for Virginia was that IBM, even though it was the first choice, wouldn't have had a machine ready in time for the Fall 500. By the stiff competition for #3-6 (all new machines), you can see why it was important to that they ship this year.
I don't know what the "self-made" thing means, but I don't think it's related to cost, like some pundits claim. Remember, Apple beat out all other bids tendered using the list price by a factor of two. The only bid in question was the Dell Itanium system that fell through when "Dell was exploring pricing options" whatever that means (IMO, it probably means Dell knows nothing about building clusters based
on 64 bit chips and should stick to ripping people off on their 32 bit systems no matter how desperate they were to undercut Apple).
Finally, with a little wryness, I think Apple needs to give a big thank you to Intel, Dell, HP, IBM, and Sun for creating
infiniband and making Apple's entry into the supercomputer marketplace possible.
* Note $4.2 million is probably the education list price when spec'd with 2GB and accounting for spare machines. The actual list price for 1100 machines is around $4.4 million.