Virginia Tech PowerMac Cluster Ranks 3rd

MacRumors · Nov 16, 2003

Results from the 22nd edition of the TOP500 List of the World's Fastest Supercomputers were released today.

As expected, Virginia Tech's PowerMac G5 Cluster has been officially ranked 3rd fastest in the world, behind the Earth Simulator Center (#1) and the Los Alamos National Laboratory (#2). The updated list can be found at Top500.org.

More PowerMac G5 clusters are expected to be assembled in the future, as Virginia Tech plans on releasing detailed plans on the construction of their cluster.

Rocketman · Nov 16, 2003

Data

Data

Virginia Polytechnic cluster
2200 IBM PowerPC 970 in Apple G5 2.0 Ghz processors
10.3 Tflops, 0.00468 Tflop/processor
Apple (Ranked 3 in world)
OSX

National Center for Supercomputing cluster
2500 Intel Xeon 3.06 Ghz processors
9.8 Tflops, 0.00392 Tflop/processor
Dell (Ranked 4 in world)
Red Hat
Dell/Intel 16.2% slower per processorthan Apple/IBM

Pacific Northwest cluster
1936 Intel Itanium 2 1.5 Ghz
8.6 Tflops, 0.00444 Tflop/processor
HP (Ranked 5 in world)
Red Hat
HP/Intel 5.1% slower per processorthan Apple/IBM

Rocketman

Puts UK advertising in context, eh?
"I toldyou so"- Steve Jobs in an insanely great moment.

MattG · Nov 16, 2003

Wonder how many DP G5's linked together it'd take to make it the #1 Supercomputer?

Freg3000 · Nov 16, 2003

I soooooo want a Virginia Tech Switcher commercial, even though that campaign is probably over and VT didn't really switch, they just bought a lot of G5s. But how cool would that be?

"We are Virginia Tech, and.....We have the 3rd fatest computer in the world."

And the British wouldn't be able to ban that one, because it wouldbe 100% true. 😀

adelaney · Nov 16, 2003

self made supercomputer

I think it's also significant that there is only one other self made supercomputer in the top 100, at position 67....VT is going to release how it made the thing and everything you need to know to make your own, presumably with all the others you just have to contact the manufacture and accept the price they give you for the whole shebang. Obviously you'd be able to customize the thing to your needs, but not as much as if you were able to build it yourself.

Rincewind42 · Nov 16, 2003

Originally posted by MattG
Wonder how many DP G5's linked together it'd take to make it the #1 Supercomputer?

At least 2560 Dual PMG5s, as that would match the Peak of the Earth Simulator (yes, the G5 matches the Earth Simulator's peak flop rating cpu-for-cpu). Accounting for overhead, probably closer to 3000 machines however.

~Shard~ · Nov 16, 2003

The Next Supercomputer?

So which university is going to order 1100 Dual 2.5 GHz G5 PowerMacs when the speed boosts are announced in January @ MWSF? 😉

MacRETARD · Nov 16, 2003

THE ITANIC IS FASTER!

Its only running at 1.5 ghz and is 5 % slower per cpu while the G5 is running at 2ghz!

Ok, sorry, Im just picturing what some zealot would say if there was a p4 cluster running at 3.2 ghz that was ranked higher.

I would like to see how an opteron rates, we should know soon. Amd sold a 10,368 cpu cluster to Sandia National Labs.

Rincewind42 · Nov 16, 2003

Originally posted by Freg3000
I soooooo want a Virginia Tech Switcher commercial, even though that campaign is probably over and VT didn't really switch, they just bought a lot of G5s. But how cool would that be?

Actually, I can see it now. Walk with me...

Commercial fades in with the title "My First Super Computer". A montage of the VT volunteers putting together the cluster, copious shots of the PowerMac G5 throughout, and real home movie feel. Then towards the end we switch to an awards ceremony taking place, with an announcer reading off the top 5 (incoherent until we reach #3). Then we hear the VT cluster, rank 3 in the world. The Dell cluster, rank 4 in the world, fade out incoherent to the rest of the ranking. Jeff Goldblum comes in and narrates over a spinning G5.

"Why don't you start building your own Super Computer?", fade to Apple Logo, fade out.

Rocketman · Nov 16, 2003

Re: The Next Supercomputer?

Originally posted by ~Shard~
So which university is going to order 1100 Dual 2.5 GHz G5 PowerMacs when the speed boosts are announced in January @ MWSF? 😉

Whoever spends a mere $10m on 2200 of the more likely 2.6Ghz systems with an upgraded FSB speed, will be tickling #1 in the world. With first generation 970-G5's

Computers for the rest of us indeed.

And just in case you are confused, $10m is dirt cheap as compared to the top 5, last ranking. Mellanox deserves supreme kudos as well for Infiniband.

This is all off the shelf stuff guys and anyone can get a few students together with free pizza and cola!

Thank you pizza and cola, the Apple accessory of choice for supercomputer installers!

Question: What if someone just spends a traditional supercomputer budget of $25-40m and see where that leads?

Rocketman

Rincewind42 · Nov 16, 2003

Re: THE ITANIC IS FASTER!

Originally posted by MacRETARD
Its only running at 1.5 ghz and is 5 % slower per cpu while the G5 is running at 2ghz!

Ok, sorry, Im just picturing what some zealot would say if there was a p4 cluster running at 3.2 ghz that was ranked higher.

I would like to see how an opteron rates, we should know soon. Amd sold a 10,368 cpu cluster to Sandia National Labs.

Yea, the Itanium is a floating point monster, same Peak as the 970 in fact. Likely the difference is that they had more time to optimize their cluster vs the VT cluster, as they are at 74% efficiency vs VT at 58%. They'll get another chance to benchmark next year, so who knows what we'll see then . As for the Opteron cluster, I suspect that it has the same peak as the 970, but I don't know how it performs - all the benchmarks I've seen so far have shown Opteron vs 970 to be a wash.

RalphNumbers · Nov 16, 2003

Re: Re: THE ITANIC IS FASTER!

Originally posted by Rincewind42
Yea, the Itanium is a floating point monster, same Peak as the 970 in fact. Likely the difference is that they had more time to optimize their cluster vs the VT cluster, as they are at 74% efficiency vs VT at 58%. They'll get another chance to benchmark next year, so who knows what we'll see then . As for the Opteron cluster, I suspect that it has the same peak as the 970, but I don't know how it performs - all the benchmarks I've seen so far have shown Opteron vs 970 to be a wash.

Actually a 2816 Opteron 2Ghz cluster came in 6th, a pretty disappointing performance.

The 3 machines immeadiately behind Bigmac are interesting:

4
NCSA
United States/2003
Tungsten
PowerEdge 1750, P4 Xeon 3.06 GHz, Myrinet / 2500
Dell
9819 Rmax
15300 Rpeak

5
Pacific Northwest National Laboratory
United States/2003
Mpp2
Integrity rx2600 Itanium2 1.5 GHz, Quadrics / 1936
HP
8633 Rmax
11616 Rpeak

6
Los Alamos National Laboratory
United States/2003
Lightning
Opteron 2 GHz, Myrinet / 2816
Linux Networx
8051 Rmax
11264 Rpeak

As you can see here http://www.top500.org/list/2003/11/

tazznb · Nov 16, 2003

Re: Data

Originally posted by Rocketman
Data

Virginia Polytechnic cluster
2200 IBM PowerPC 970 in Apple G5 2.0 Ghz processors
10.3 Tflops, 0.00468 Tflop/processor
Apple (Ranked 3 in world)
OSX

National Center for Supercomputing cluster
2500 Intel Xeon 3.06 Ghz processors
9.8 Tflops, 0.00392 Tflop/processor
Dell (Ranked 4 in world)
Red Hat
Dell/Intel 16.2% slower per processorthan Apple/IBM

Pacific Northwest cluster
1936 Intel Itanium 2 1.5 Ghz
8.6 Tflops, 0.00444 Tflop/processor
HP (Ranked 5 in world)
Red Hat
HP/Intel 5.1% slower per processorthan Apple/IBM

Rocketman

Puts UK advertising in context, eh?
"I toldyou so"- Steve Jobs in an insanely great moment.

(You should've added prices for all listed)

They should score really big on the price point;

Steve Jobs should hire MC Hammer, and have him rap to Intel & the rest saying "You can't touch this!"

zweigand · Nov 16, 2003

Re: Data

Originally posted by Rocketman
Data

Virginia Polytechnic cluster
2200 IBM PowerPC 970 in Apple G5 2.0 Ghz processors
10.3 Tflops, 0.00468 Tflop/processor
Apple (Ranked 3 in world)
OSX

National Center for Supercomputing cluster
2500 Intel Xeon 3.06 Ghz processors
9.8 Tflops, 0.00392 Tflop/processor
Dell (Ranked 4 in world)
Red Hat
Dell/Intel 16.2% slower per processorthan Apple/IBM

Pacific Northwest cluster
1936 Intel Itanium 2 1.5 Ghz
8.6 Tflops, 0.00444 Tflop/processor
HP (Ranked 5 in world)
Red Hat
HP/Intel 5.1% slower per processorthan Apple/IBM

Rocketman

Puts UK advertising in context, eh?
"I toldyou so"- Steve Jobs in an insanely great moment.

Actually, according to NCSA's WebSite the cluster is comprised of 1450 Dell PowerEdge Servers ...and since each of those are dual proc systems ...do the math ...that's 2900 procs, not 2500

might want to update your stats above 😉

greenstork · Nov 16, 2003

Re: Re: The Next Supercomputer?

Originally posted by Rocketman
Whoever spends a mere $10m on 2200 of the more likely 2.6Ghz systems with an upgraded FSB speed, will be tickling #1 in the world. With first generation 970-G5's

To quash this myth before it gets too out of control. You can't just add computers and expect a supercomputer to get that much faster. For every additional node on the system, the efficiency of the whole system decreases so merely doubling the number of computers *will not* double the computing power. Since the decrease in efficiency is on an exponential scale, you'd have to increase the number of nodes exponentially to achieve the desired speed increase. Hope this clears things up a bit for all those pie-in-the-sky posts.

That said, this is excellent news, especially since Apple is squarely in front of the 4th place Dell system, woohooo 😀

theRebel · Nov 16, 2003

Re: Re: Data

Originally posted by zweigand
Actually, according to NCSA's WebSite the cluster is comprised of 1450 Dell PowerEdge Servers ...and since each of those are dual proc systems ...do the math ...that's 2900 procs, not 2500

might want to update your stats above 😉

Yeah i had noticed that too.

However the stats posted here are the same as what top500.org has posted in their chart. It is possible that they may just have a typo though.

It is also possible that the November stats for the NCSA cluster were achieved with only 1250 systems, but that the NCSA is planning to add another 200. Their press release does not make it clear whether 1450 is how many they have now or how many they plan to have.

How can we find out which is right? The Top500.org chart or the NCSA press release?

AppleManEric · Nov 16, 2003

Can a node in the VT supercomputer be replaced with a faster model in the future? Or will having two different speeds of procs screw it up?

tychay · Nov 16, 2003

Summary of my posts on SlashDot.

Reformatted for MacRumors.

The first is the G5 Terascale cluster at Virginia Tech at #3 (10.28 Tflops/s, 2200 CPU, Infiniband) is the first academic computer to break 10 teraflops/s. This extra performance was promised at Mac OS X Developer's conference last month. Not to sure if the price is a testament to Infiniband ($1.5 million cabling, cards, and routers) or the Macs ($4.2 million list price*).

Good thing too because in a surprise move the NCSA cluster made the list at #4 (9.82 Tflops/s, 2500 CPU, Myrinet) and might have beat Terascale's previously reported 9.555 Tflops/s. This cluster is built using Dell's running Pentium 4 XEONs and Red Hat Linux. One subtle point to note is that they didn't get all the systems online in time (there should be 2900 CPUs, not 2500). I bet the PSC who coded déjà vu and an ex-Chief Scientist of SDSC are appreciating having a hand in edging out their arch-rival NCSA for #3--not to mention Apple beating Dell. 🙂

The fastest Itanium cluster is at #5 (8.63 TFlops/s, 1936 CPU, Quadrics) which is looking like the odd man out boxed in by a PC based systems using Linux OS, Myrinet networking--the P4 Xeon above, and the most powerful Opteron system at #6 below (8.05 Tflops/s, 2816 CPU, Myrinet).

And finally, It's easy to overlook #73, a single compute node of BlueGene/L (1.44 Tflops/s, 1024 CPU). Imagine 128 of these connected together and you have something that will easily take #1 when it's completed even if we handicap it 20-40%. As noted on SlashDot earlier, this is also on Linux. But Mac users should note that its CPU is based on the PowerPC architecture.

Note that this Mac cluster is no longer 10x cheaper than it's peers since #4-6 were built using the Pentium 4 Xeon, Itanium, and Opteron respectively. According to Virginia Tech which priced the latter two, those systems cost for these would be around $9-10 million--twice as much. Not absolutely sure how much the P4 XEON system cost, but given the number of CPUs used, I'd say it's not price-competitive with the Mac. For instance, you can go to Dell's website and price the same 1250 machines for $6.7-8.4 million (the upper price includes Red Hat and 1MB cache). Where is the vaunted value in IA32 now? Therefore, of the four, the G5 (970) offers the highest flops/cpu, the second most flops/cycle (Rpeak is the same/cycle as Itanium but it's not as efficient so it's Rmax gets edged out), and untouchable price/performance. (You also get a DVD burner, a good video card, optical in/out, firewire, and a historically high resale value when you want to upgrade your systems).

As noted on Mac Rumors earlier, IBM will be introducing G5-based blades in the same vein as their P4 Xeon ones. Since IBM supports the Opteron and Itanium, I hope they're soon to follow. Imagine choosing any combination of 4 different CPU families in your blade center. How's that for business flexibility! (Now if Apple licenses Mac OS X Server to IBM, my dream comes true.)

One person claimed that Apple did a lot of assembly language hacking. Hardly! I'm sure the Apple sales people moved mountains to get those systems shipped on time not to mention the work done making drivers for the Infiniband cards, but the coding was mostly one man working two months getting some software ported and various other libraries from people such as Professor Goto. I'm sure the other systems had as much, probably more, hacking done.

The Opteron is a horrible performer in this benchmark. This is one to rub in the face of anyone who blabs about Opteron, but please do it right. Simply put, the Rpeak of the 2Ghz G5 is twice that of the 2Ghz Opteron at 4Gflops/s vs 2Gflops/s. Note though that the Opteron may suffer less drop-off from the peak, but it's not going to be enough to make up for that factor of two. Certainly the Itanium 2 does and it has the same flops/Mhz and has less dropoff as the G5. The problem is, it peaks out at 1.5 Ghz. What then when faster 970's launch early next year and again see a big bump do to going 90nm?

BTW, Altivec/VMX/Velocity doesn't get involved here because it can't do double precision mult-adds. Whenever there is a something optimized for the Altivec, it's impressive (a Slashdotter mentioned the P4's performance at distributed.net and had it thrown back in his face when stats showed G5s outperformed them by a factor of 3 or more due to these programs being ideal cases for Altivec optimiation). Note that LINPACK (benchmark used by the Top500) is very dependant on network speed, which is why I keep mentioning the network when I mention the stats above.

If you can use Altivec, the G5 usually beats the Opteron, which beats the G5 for non-vectorizeable integer performance. I still think the Opteron is probably the best web server chip around and perhaps the best price/performance database chip, if that means anything.

Finally, heat isn't an issue. First, the 2Ghz G5 generates 47 watts of heat which is half that of the P4 @ 3Ghz. Second, IBM introducing G5 blades shows that heat isn't an issue. The problem for Virginia was that IBM, even though it was the first choice, wouldn't have had a machine ready in time for the Fall 500. By the stiff competition for #3-6 (all new machines), you can see why it was important to that they ship this year.

I don't know what the "self-made" thing means, but I don't think it's related to cost, like some pundits claim. Remember, Apple beat out all other bids tendered using the list price by a factor of two. The only bid in question was the Dell Itanium system that fell through when "Dell was exploring pricing options" whatever that means (IMO, it probably means Dell knows nothing about building clusters based on 64 bit chips and should stick to ripping people off on their 32 bit systems no matter how desperate they were to undercut Apple).

Finally, with a little wryness, I think Apple needs to give a big thank you to Intel, Dell, HP, IBM, and Sun for creating infiniband and making Apple's entry into the supercomputer marketplace possible. 😀

* Note $4.2 million is probably the education list price when spec'd with 2GB and accounting for spare machines. The actual list price for 1100 machines is around $4.4 million.

pkradd · Nov 16, 2003

As stated above, Big Mac is not yet operating at full potential. It may be able to add 3 or 4 more teraflops. There will be another list published in 6 months so there should be some new computers named and Big Mac may or may not stay where it is. As far as the 3rd fastest computer in the world, I'd take that with a grain of salt. Many governments have computer systems that are not advertised (known) that may be as fast or faster then any of those posted on the list.

T.Rex · Nov 16, 2003

Anyone else notice VT is now at 10.3 Tflops? Coincidence? Yeah. 😀

Rincewind42 · Nov 16, 2003

Re: Re: Re: The Next Supercomputer?

Originally posted by greenstork
To quash this myth before it gets too out of control. You can't just add computers and expect a supercomputer to get that much faster. For every additional node on the system, the efficiency of the whole system decreases so merely doubling the number of computers *will not* double the computing power. Since the decrease in efficiency is on an exponential scale, you'd have to increase the number of nodes exponentially to achieve the desired speed increase. Hope this clears things up a bit for all those pie-in-the-sky posts.

Exponential is worst case. It is possible to get linear increase - the problem space in which you can get it is just very small. The exponential case comes from each computer in the cluster needing to get data that is on another (potentially all the other) computers on the cluster. Most problems aren't that bad - a large part of them can be solved on a single node before data needs to be transmitted between nodes. Linear algebra is actually a middle of the road problem space, one where you can do a lot of work on a single node before needing information from another node. If you'll note, when the VT cluster got their last 44 systems online they were still able to increase efficiency of the cluster.

Just look at the various distributed computing programs going on right now. They comprise of tens of thousands of computer around the world and yet they still manage to get very good performance - all because the problem can be reduced to running on one machine before being sent back to a master machine for analysis.

greenstork · Nov 16, 2003

You know what I'd love to see is the cost per teraflop. I bet Apple is by far the cheapest of the whole top500. Now that would be an interesting stat.

greenstork · Nov 16, 2003

Re: Re: Re: Re: The Next Supercomputer?

Originally posted by Rincewind42
Exponential is worst case. It is possible to get linear increase - the problem space in which you can get it is just very small. The exponential case comes from each computer in the cluster needing to get data that is on another (potentially all the other) computers on the cluster. Most problems aren't that bad - a large part of them can be solved on a single node before data needs to be transmitted between nodes. Linear algebra is actually a middle of the road problem space, one where you can do a lot of work on a single node before needing information from another node. If you'll note, when the VT cluster got their last 44 systems online they were still able to increase efficiency of the cluster.

Just look at the various distributed computing programs going on right now. They comprise of tens of thousands of computer around the world and yet they still manage to get very good performance - all because the problem can be reduced to running on one machine before being sent back to a master machine for analysis.

I would argue that distributed computing (e.g. processing one unit on one machine) is not what the cluster computer is trying to achieve, hence the Infiniband connection between the nodes. So while your analogy to distributed computing holds true on paper, it's not in the nature of the tasks that a cluster supercomputer is designed for, IMO. I could be wrong, but that is my impression.

bobindashadows · Nov 16, 2003

Re: Summary of my posts on SlashDot.

Originally posted by tychay
Reformatted for MacRumors.
I don't know what the "self-made" thing means, but I don't think it's related to cost, like some pundits claim.

Well, that was an impressive post. My impression of what "self-made" means is that the supercomputer was not created by the manufacturer had the G5 cluster been brought into the facility by Apple and then constructed by Apple employees, then it would not have been "self-made". But, since the VT just bought the computers, wrote their own software, and hooked up Infiniband, it's self-made.

gwuMACaddict · Nov 16, 2003

Originally posted by T.Rex
Anyone else notice VT is now at 10.3 Tflops? Coincidence? Yeah. 😀

hahaha.

i have to think this will help thrust apple to the forefront when it comes to universities and research labs looking for excellent computing power at a cheaper price.

Virginia Tech PowerMac Cluster Ranks 3rd

macrumors bot

macrumors 603

macrumors 68040

macrumors 68000

macrumors newbie

macrumors 6502a

macrumors P6

macrumors newbie

macrumors 6502a

macrumors 603

macrumors 6502a

macrumors newbie

macrumors regular

macrumors 6502a

macrumors 6502a

macrumors member

macrumors newbie

macrumors regular

macrumors regular

macrumors member

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 6502

macrumors 68040

Our Staff