View Full Version : Virginia Tech PowerMac Cluster Ranks 3rd
MacRumors
Nov 16, 2003, 10:33 AM
Results from the 22nd edition of the TOP500 List of the World's Fastest Supercomputers were released today (http://www.top500.org/lists/2003/11/press-release.php).
As expected, Virginia Tech's PowerMac G5 Cluster (http://www.macrumors.com/pages/2003/09/20030902190056.shtml) has been officially ranked 3rd fastest in the world, behind the Earth Simulator Center (#1) and the Los Alamos National Laboratory (#2). The updated list can be found at Top500.org (http://www.top500.org/list/2003/11/).
More PowerMac G5 clusters are expected to be assembled in the future, as Virginia Tech plans (http://www.macrumors.com/pages/2003/11/20031105193558.shtml) on releasing detailed plans on the construction of their cluster.
Rocketman
Nov 16, 2003, 10:37 AM
Data
Virginia Polytechnic cluster
2200 IBM PowerPC 970 in Apple G5 2.0 Ghz processors
10.3 Tflops, 0.00468 Tflop/processor
Apple (Ranked 3 in world)
OSX
National Center for Supercomputing cluster
2500 Intel Xeon 3.06 Ghz processors
9.8 Tflops, 0.00392 Tflop/processor
Dell (Ranked 4 in world)
Red Hat
Dell/Intel 16.2% slower per processorthan Apple/IBM
Pacific Northwest cluster
1936 Intel Itanium 2 1.5 Ghz
8.6 Tflops, 0.00444 Tflop/processor
HP (Ranked 5 in world)
Red Hat
HP/Intel 5.1% slower per processorthan Apple/IBM
Rocketman
Puts UK advertising in context, eh?
"I toldyou so"- Steve Jobs in an insanely great moment.
MattG
Nov 16, 2003, 10:39 AM
Wonder how many DP G5's linked together it'd take to make it the #1 Supercomputer?
Freg3000
Nov 16, 2003, 10:40 AM
I soooooo want a Virginia Tech Switcher commercial, even though that campaign is probably over and VT didn't really switch, they just bought a lot of G5s. But how cool would that be?
"We are Virginia Tech, and.....We have the 3rd fatest computer in the world."
And the British wouldn't be able to ban that one, because it wouldbe 100% true. :D
adelaney
Nov 16, 2003, 11:11 AM
I think it's also significant that there is only one other self made supercomputer in the top 100, at position 67....VT is going to release how it made the thing and everything you need to know to make your own, presumably with all the others you just have to contact the manufacture and accept the price they give you for the whole shebang. Obviously you'd be able to customize the thing to your needs, but not as much as if you were able to build it yourself.
Rincewind42
Nov 16, 2003, 11:22 AM
Originally posted by MattG
Wonder how many DP G5's linked together it'd take to make it the #1 Supercomputer?
At least 2560 Dual PMG5s, as that would match the Peak of the Earth Simulator (yes, the G5 matches the Earth Simulator's peak flop rating cpu-for-cpu). Accounting for overhead, probably closer to 3000 machines however.
~Shard~
Nov 16, 2003, 11:27 AM
So which university is going to order 1100 Dual 2.5 GHz G5 PowerMacs when the speed boosts are announced in January @ MWSF? ;)
MacRETARD
Nov 16, 2003, 11:29 AM
Its only running at 1.5 ghz and is 5 % slower per cpu while the G5 is running at 2ghz!
Ok, sorry, Im just picturing what some zealot would say if there was a p4 cluster running at 3.2 ghz that was ranked higher.
I would like to see how an opteron rates, we should know soon. Amd sold a 10,368 cpu cluster to Sandia National Labs.
Rincewind42
Nov 16, 2003, 11:33 AM
Originally posted by Freg3000
I soooooo want a Virginia Tech Switcher commercial, even though that campaign is probably over and VT didn't really switch, they just bought a lot of G5s. But how cool would that be?
Actually, I can see it now. Walk with me...
Commercial fades in with the title "My First Super Computer". A montage of the VT volunteers putting together the cluster, copious shots of the PowerMac G5 throughout, and real home movie feel. Then towards the end we switch to an awards ceremony taking place, with an announcer reading off the top 5 (incoherent until we reach #3). Then we hear the VT cluster, rank 3 in the world. The Dell cluster, rank 4 in the world, fade out incoherent to the rest of the ranking. Jeff Goldblum comes in and narrates over a spinning G5.
"Why don't you start building your own Super Computer?", fade to Apple Logo, fade out.
Rocketman
Nov 16, 2003, 11:38 AM
Originally posted by ~Shard~
So which university is going to order 1100 Dual 2.5 GHz G5 PowerMacs when the speed boosts are announced in January @ MWSF? ;)
Whoever spends a mere $10m on 2200 of the more likely 2.6Ghz systems with an upgraded FSB speed, will be tickling #1 in the world. With first generation 970-G5's
Computers for the rest of us indeed.
And just in case you are confused, $10m is dirt cheap as compared to the top 5, last ranking. Mellanox deserves supreme kudos as well for Infiniband.
This is all off the shelf stuff guys and anyone can get a few students together with free pizza and cola!
Thank you pizza and cola, the Apple accessory of choice for supercomputer installers!
Question: What if someone just spends a traditional supercomputer budget of $25-40m and see where that leads?
Rocketman
Rincewind42
Nov 16, 2003, 11:40 AM
Originally posted by MacRETARD
Its only running at 1.5 ghz and is 5 % slower per cpu while the G5 is running at 2ghz!
Ok, sorry, Im just picturing what some zealot would say if there was a p4 cluster running at 3.2 ghz that was ranked higher.
I would like to see how an opteron rates, we should know soon. Amd sold a 10,368 cpu cluster to Sandia National Labs.
Yea, the Itanium is a floating point monster, same Peak as the 970 in fact. Likely the difference is that they had more time to optimize their cluster vs the VT cluster, as they are at 74% efficiency vs VT at 58%. They'll get another chance to benchmark next year, so who knows what we'll see then . As for the Opteron cluster, I suspect that it has the same peak as the 970, but I don't know how it performs - all the benchmarks I've seen so far have shown Opteron vs 970 to be a wash.
RalphNumbers
Nov 16, 2003, 11:57 AM
Originally posted by Rincewind42
Yea, the Itanium is a floating point monster, same Peak as the 970 in fact. Likely the difference is that they had more time to optimize their cluster vs the VT cluster, as they are at 74% efficiency vs VT at 58%. They'll get another chance to benchmark next year, so who knows what we'll see then . As for the Opteron cluster, I suspect that it has the same peak as the 970, but I don't know how it performs - all the benchmarks I've seen so far have shown Opteron vs 970 to be a wash.
Actually a 2816 Opteron 2Ghz cluster came in 6th, a pretty disappointing performance.
The 3 machines immeadiately behind Bigmac are interesting:
4
NCSA
United States/2003
Tungsten
PowerEdge 1750, P4 Xeon 3.06 GHz, Myrinet / 2500
Dell
9819 Rmax
15300 Rpeak
5
Pacific Northwest National Laboratory
United States/2003
Mpp2
Integrity rx2600 Itanium2 1.5 GHz, Quadrics / 1936
HP
8633 Rmax
11616 Rpeak
6
Los Alamos National Laboratory
United States/2003
Lightning
Opteron 2 GHz, Myrinet / 2816
Linux Networx
8051 Rmax
11264 Rpeak
As you can see here http://www.top500.org/list/2003/11/
tazznb
Nov 16, 2003, 12:14 PM
Originally posted by Rocketman
Data
Virginia Polytechnic cluster
2200 IBM PowerPC 970 in Apple G5 2.0 Ghz processors
10.3 Tflops, 0.00468 Tflop/processor
Apple (Ranked 3 in world)
OSX
National Center for Supercomputing cluster
2500 Intel Xeon 3.06 Ghz processors
9.8 Tflops, 0.00392 Tflop/processor
Dell (Ranked 4 in world)
Red Hat
Dell/Intel 16.2% slower per processorthan Apple/IBM
Pacific Northwest cluster
1936 Intel Itanium 2 1.5 Ghz
8.6 Tflops, 0.00444 Tflop/processor
HP (Ranked 5 in world)
Red Hat
HP/Intel 5.1% slower per processorthan Apple/IBM
Rocketman
Puts UK advertising in context, eh?
"I toldyou so"- Steve Jobs in an insanely great moment.
(You should've added prices for all listed)
They should score really big on the price point;
Steve Jobs should hire MC Hammer, and have him rap to Intel & the rest saying "You can't touch this!"
zweigand
Nov 16, 2003, 12:25 PM
Originally posted by Rocketman
Data
Virginia Polytechnic cluster
2200 IBM PowerPC 970 in Apple G5 2.0 Ghz processors
10.3 Tflops, 0.00468 Tflop/processor
Apple (Ranked 3 in world)
OSX
National Center for Supercomputing cluster
2500 Intel Xeon 3.06 Ghz processors
9.8 Tflops, 0.00392 Tflop/processor
Dell (Ranked 4 in world)
Red Hat
Dell/Intel 16.2% slower per processorthan Apple/IBM
Pacific Northwest cluster
1936 Intel Itanium 2 1.5 Ghz
8.6 Tflops, 0.00444 Tflop/processor
HP (Ranked 5 in world)
Red Hat
HP/Intel 5.1% slower per processorthan Apple/IBM
Rocketman
Puts UK advertising in context, eh?
"I toldyou so"- Steve Jobs in an insanely great moment.
Actually, according to NCSA's WebSite (http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/XeonCluster/) the cluster is comprised of 1450 Dell PowerEdge Servers ...and since each of those are dual proc systems ...do the math ...that's 2900 procs, not 2500
might want to update your stats above ;)
greenstork
Nov 16, 2003, 12:39 PM
Originally posted by Rocketman
Whoever spends a mere $10m on 2200 of the more likely 2.6Ghz systems with an upgraded FSB speed, will be tickling #1 in the world. With first generation 970-G5's
To quash this myth before it gets too out of control. You can't just add computers and expect a supercomputer to get that much faster. For every additional node on the system, the efficiency of the whole system decreases so merely doubling the number of computers *will not* double the computing power. Since the decrease in efficiency is on an exponential scale, you'd have to increase the number of nodes exponentially to achieve the desired speed increase. Hope this clears things up a bit for all those pie-in-the-sky posts.
That said, this is excellent news, especially since Apple is squarely in front of the 4th place Dell system, woohooo :D
theRebel
Nov 16, 2003, 12:41 PM
Originally posted by zweigand
Actually, according to NCSA's WebSite (http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/XeonCluster/) the cluster is comprised of 1450 Dell PowerEdge Servers ...and since each of those are dual proc systems ...do the math ...that's 2900 procs, not 2500
might want to update your stats above ;)
Yeah i had noticed that too.
However the stats posted here are the same as what top500.org has posted in their chart. It is possible that they may just have a typo though.
It is also possible that the November stats for the NCSA cluster were achieved with only 1250 systems, but that the NCSA is planning to add another 200. Their press release does not make it clear whether 1450 is how many they have now or how many they plan to have.
How can we find out which is right? The Top500.org chart or the NCSA press release?
AppleManEric
Nov 16, 2003, 12:44 PM
Can a node in the VT supercomputer be replaced with a faster model in the future? Or will having two different speeds of procs screw it up?
tychay
Nov 16, 2003, 12:45 PM
Reformatted for MacRumors.
The first is the G5 Terascale cluster at Virginia Tech at #3 (10.28 Tflops/s, 2200 CPU, Infiniband) is the first academic computer to break 10 teraflops/s. This extra performance was promised at Mac OS X Developer's conference (http://www.macdevcenter.com/pub/a/mac/2003/10/29/osxcon_g5cluster.html) last month. Not to sure if the price is a testament to Infiniband ($1.5 million cabling, cards, and routers) or the Macs ($4.2 million list price*).
Good thing too because in a surprise move the NCSA cluster (http://access.ncsa.uiuc.edu/Releases/08.04.03_New_NCSA_C.html) made the list at #4 (9.82 Tflops/s, 2500 CPU, Myrinet) and might have beat Terascale's previously reported 9.555 Tflops/s. This cluster is built using Dell's running Pentium 4 XEONs and Red Hat Linux. One subtle point to note is that they didn't get all the systems online in time (there should be 2900 CPUs, not 2500). I bet the PSC who coded déjà vu (http://www.chaosmint.com/mac/vt-supercomputer/) and an ex-Chief Scientist of SDSC (http://computing.vt.edu/research_computing/terascale/pressrelease.html) are appreciating having a hand in edging out their arch-rival NCSA for #3--not to mention Apple beating Dell. :)
The fastest Itanium cluster is at #5 (8.63 TFlops/s, 1936 CPU, Quadrics) which is looking like the odd man out boxed in by a PC based systems using Linux (http://siliconvalley.internet.com/news/article.php/2249001) OS, Myrinet networking--the P4 Xeon above, and the most powerful Opteron system at #6 below (8.05 Tflops/s, 2816 CPU, Myrinet).
And finally, It's easy to overlook #73, a single compute node of BlueGene/L (1.44 Tflops/s, 1024 CPU). Imagine 128 of these (http://news.com.com/2100-7337-5107422.html) connected together and you have something that will easily take #1 when it's completed even if we handicap it 20-40%. As noted on SlashDot earlier (http://slashdot.org/article.pl?sid=03/10/31/0232243&tid=), this is also on Linux. But Mac users should note that its CPU is based on the PowerPC architecture (http://www.nyjournalnews.com/newsroom/111403/d01a14ibm.html).
Note that this Mac cluster is no longer 10x cheaper than it's peers since #4-6 were built using the Pentium 4 Xeon, Itanium, and Opteron respectively. According to Virginia Tech which priced the latter two, those systems cost for these would be around $9-10 million--twice as much. Not absolutely sure how much the P4 XEON system cost, but given the number of CPUs used, I'd say it's not price-competitive with the Mac. For instance, you can go to Dell's website (http://www1.us.dell.com/content/products/productdetails.aspx/pedge_1750?c=us&cs=555&l=en&s=biz) and price the same 1250 machines for $6.7-8.4 million (the upper price includes Red Hat and 1MB cache). Where is the vaunted value in IA32 now? Therefore, of the four, the G5 (970) offers the highest flops/cpu, the second most flops/cycle (Rpeak is the same/cycle as Itanium but it's not as efficient so it's Rmax gets edged out), and untouchable price/performance. (You also get a DVD burner, a good video card, optical in/out, firewire, and a historically high resale value when you want to upgrade your systems).
As noted on Mac Rumors earlier, IBM will be introducing G5-based blades (http://www-1.ibm.com/support/docview.wss?uid=psg1MIGR-53431) in the same vein as their P4 Xeon (http://www-132.ibm.com/webapp/wcs/stores/servlet/CategoryDisplay?categoryId=2575417&storeId=1&catalogId=-840&langId=-1) ones. Since IBM supports the Opteron and Itanium, I hope they're soon to follow. Imagine choosing any combination of 4 different CPU families in your blade center. How's that for business flexibility! (Now if Apple licenses Mac OS X Server to IBM, my dream comes true.)
One person claimed that Apple did a lot of assembly language hacking. Hardly! I'm sure the Apple sales people moved mountains to get those systems shipped on time not to mention the work done making drivers for the Infiniband cards, but the coding was mostly one man working two months (http://macslash.org/article.pl?sid=03/10/28/2357235&mode=nested) getting some software ported and various other libraries from people such as Professor Goto. I'm sure the other systems had as much, probably more, hacking done.
The Opteron is a horrible performer in this benchmark. This is one to rub in the face of anyone who blabs about Opteron, but please do it right. Simply put, the Rpeak of the 2Ghz G5 is twice that of the 2Ghz Opteron at 4Gflops/s vs 2Gflops/s. Note though that the Opteron may suffer less drop-off from the peak, but it's not going to be enough to make up for that factor of two. Certainly the Itanium 2 does and it has the same flops/Mhz and has less dropoff as the G5. The problem is, it peaks out at 1.5 Ghz. What then when faster 970's launch early next year and again see a big bump do to going 90nm?
BTW, Altivec/VMX/Velocity doesn't get involved here because it can't do double precision mult-adds. Whenever there is a something optimized for the Altivec, it's impressive (a Slashdotter mentioned the P4's performance at distributed.net (http://distributed.net/) and had it thrown back in his face when stats (http://n0cgi.distributed.net/speed/query.php) showed G5s outperformed them by a factor of 3 or more due to these programs being ideal cases for Altivec optimiation). Note that LINPACK (benchmark used by the Top500) is very dependant on network speed, which is why I keep mentioning the network when I mention the stats above.
If you can use Altivec, the G5 usually beats the Opteron, which beats the G5 for non-vectorizeable integer performance. I still think the Opteron is probably the best web server chip around and perhaps the best price/performance database chip, if that means anything.
Finally, heat isn't an issue. First, the 2Ghz G5 generates 47 watts of heat which is half that of the P4 @ 3Ghz. Second, IBM introducing G5 blades shows that heat isn't an issue. The problem for Virginia was that IBM, even though it was the first choice, wouldn't have had a machine ready in time for the Fall 500. By the stiff competition for #3-6 (all new machines), you can see why it was important to that they ship this year.
I don't know what the "self-made" thing means, but I don't think it's related to cost, like some pundits claim. Remember, Apple beat out all other bids tendered using the list price by a factor of two. The only bid in question was the Dell Itanium system that fell through when "Dell was exploring pricing options" whatever that means (IMO, it probably means Dell knows nothing about building clusters based on 64 bit chips (http://www1.us.dell.com/content/products/productdetails.aspx/pedge_3250?c=us&cs=555&l=en&s=biz) and should stick to ripping people off on their 32 bit systems no matter how desperate they were to undercut Apple).
Finally, with a little wryness, I think Apple needs to give a big thank you to Intel, Dell, HP, IBM, and Sun for creating infiniband (http://www.infinibandta.org/) and making Apple's entry into the supercomputer marketplace possible. :D
* Note $4.2 million is probably the education list price when spec'd with 2GB and accounting for spare machines. The actual list price for 1100 machines is around $4.4 million.
pkradd
Nov 16, 2003, 12:48 PM
As stated above, Big Mac is not yet operating at full potential. It may be able to add 3 or 4 more teraflops. There will be another list published in 6 months so there should be some new computers named and Big Mac may or may not stay where it is. As far as the 3rd fastest computer in the world, I'd take that with a grain of salt. Many governments have computer systems that are not advertised (known) that may be as fast or faster then any of those posted on the list.
T.Rex
Nov 16, 2003, 12:50 PM
Anyone else notice VT is now at 10.3 Tflops? Coincidence? Yeah. :D
Rincewind42
Nov 16, 2003, 01:01 PM
Originally posted by greenstork
To quash this myth before it gets too out of control. You can't just add computers and expect a supercomputer to get that much faster. For every additional node on the system, the efficiency of the whole system decreases so merely doubling the number of computers *will not* double the computing power. Since the decrease in efficiency is on an exponential scale, you'd have to increase the number of nodes exponentially to achieve the desired speed increase. Hope this clears things up a bit for all those pie-in-the-sky posts.
Exponential is worst case. It is possible to get linear increase - the problem space in which you can get it is just very small. The exponential case comes from each computer in the cluster needing to get data that is on another (potentially all the other) computers on the cluster. Most problems aren't that bad - a large part of them can be solved on a single node before data needs to be transmitted between nodes. Linear algebra is actually a middle of the road problem space, one where you can do a lot of work on a single node before needing information from another node. If you'll note, when the VT cluster got their last 44 systems online they were still able to increase efficiency of the cluster.
Just look at the various distributed computing programs going on right now. They comprise of tens of thousands of computer around the world and yet they still manage to get very good performance - all because the problem can be reduced to running on one machine before being sent back to a master machine for analysis.
greenstork
Nov 16, 2003, 01:04 PM
You know what I'd love to see is the cost per teraflop. I bet Apple is by far the cheapest of the whole top500. Now that would be an interesting stat.
greenstork
Nov 16, 2003, 01:09 PM
Originally posted by Rincewind42
Exponential is worst case. It is possible to get linear increase - the problem space in which you can get it is just very small. The exponential case comes from each computer in the cluster needing to get data that is on another (potentially all the other) computers on the cluster. Most problems aren't that bad - a large part of them can be solved on a single node before data needs to be transmitted between nodes. Linear algebra is actually a middle of the road problem space, one where you can do a lot of work on a single node before needing information from another node. If you'll note, when the VT cluster got their last 44 systems online they were still able to increase efficiency of the cluster.
Just look at the various distributed computing programs going on right now. They comprise of tens of thousands of computer around the world and yet they still manage to get very good performance - all because the problem can be reduced to running on one machine before being sent back to a master machine for analysis.
I would argue that distributed computing (e.g. processing one unit on one machine) is not what the cluster computer is trying to achieve, hence the Infiniband connection between the nodes. So while your analogy to distributed computing holds true on paper, it's not in the nature of the tasks that a cluster supercomputer is designed for, IMO. I could be wrong, but that is my impression.
bobindashadows
Nov 16, 2003, 01:09 PM
Originally posted by tychay
Reformatted for MacRumors.
I don't know what the "self-made" thing means, but I don't think it's related to cost, like some pundits claim.
Well, that was an impressive post. My impression of what "self-made" means is that the supercomputer was not created by the manufacturer — had the G5 cluster been brought into the facility by Apple and then constructed by Apple employees, then it would not have been "self-made". But, since the VT just bought the computers, wrote their own software, and hooked up Infiniband, it's self-made.
gwuMACaddict
Nov 16, 2003, 01:20 PM
Originally posted by T.Rex
Anyone else notice VT is now at 10.3 Tflops? Coincidence? Yeah. :D
hahaha.
i have to think this will help thrust apple to the forefront when it comes to universities and research labs looking for excellent computing power at a cheaper price.
aethier
Nov 16, 2003, 02:16 PM
well i now owe my friend 10 dollars, i made a bet that Big Mac would come in number two... ohwell.
maybe someone should get VT to join the macrumors folding team...
aethier
zweigand
Nov 16, 2003, 02:41 PM
Originally posted by theRebel
Yeah i had noticed that too.
However the stats posted here are the same as what top500.org has posted in their chart. It is possible that they may just have a typo though.
It is also possible that the November stats for the NCSA cluster were achieved with only 1250 systems, but that the NCSA is planning to add another 200. Their press release does not make it clear whether 1450 is how many they have now or how many they plan to have.
How can we find out which is right? The Top500.org chart or the NCSA press release?
that very well could be true ...but if that were the case I'd make sure to let everyone know that results were based on an incomplete cluster! ...regardless, if was only tested with 2500 procs that still doesn't change anything!! ...the 970 is clearly the faster processor in these tests! ;)
Wombatronic
Nov 16, 2003, 02:45 PM
Originally posted by greenstork
I would argue that distributed computing (e.g. processing one unit on one machine) is not what the cluster computer is trying to achieve, hence the Infiniband connection between the nodes. So while your analogy to distributed computing holds true on paper, it's not in the nature of the tasks that a cluster supercomputer is designed for, IMO. I could be wrong, but that is my impression.
You can rest assured that nearly all super computer tasks are designed to distribute well over a large cluster of nodes. Adding in more nodes should get you basically a linear speed-up until you start clogging up the network.
People are not building 2000 node clusters to shave a few seconds off of the time it took the 1000 node cluster. They are doing it because nearly all linear algebra, discretized differential equations, and particle simulations can be nicely partitioned and effectively distributed over a gazillion [sp?] odd nodes.
The main tricky thing is to get the system to properly use the network, partitioning nodes and communication up so that folks aren't tromping all over each other. Properly laid out, the nodes spend very little time waiting for one another, and most of the time computing and sending/receiving.
acj
Nov 16, 2003, 02:53 PM
Originally posted by greenstork
You know what I'd love to see is the cost per teraflop. I bet Apple is by far the cheapest of the whole top500. Now that would be an interesting stat.
I doubt it. Some of the lower ones on the list are also "home built" but from much cheaper components.
tortoise
Nov 16, 2003, 03:03 PM
Originally posted by greenstork
I would argue that distributed computing (e.g. processing one unit on one machine) is not what the cluster computer is trying to achieve, hence the Infiniband connection between the nodes. So while your analogy to distributed computing holds true on paper, it's not in the nature of the tasks that a cluster supercomputer is designed for, IMO. I could be wrong, but that is my impression.
It is a gradient. The algorithm space that will scale on cluster is generally a function of interconnect latency rather than bandwidth. Most supercomputing clusters are only good for "embarrasingly parallel" algorithm spaces. By reducing the latency they can get the clusters to scale to "somewhat less embarrassingly parallel" algorithm spaces. Infiniband still does not have the latency specs required for a great many codes to effectively run in parallel.
One of the rather huge grains of salt that you have to take with any supercomputing cluster number is that the de facto standard benchmarks for these things are in fact embarrassingly parallel codes. For most real-world applications, you won't get anything remotely representing the efficiency implied by those benchmarks. For algorithm spaces that have many fine-grained dependencies (i.e. latency bound), you will find that huge supercomputing clusters are SLOWER than a single (or few) much smaller SMP or ccNUMA boxen.
For example, I happen to run huge information theoretic codes with fine-grained dependencies. For me, clusters like the VT cluster would actually run slower than a smaller lower latency box even taking Infiniband interconnects into account. The latency between any two processors in my systems needs to be in the ballpark of 1-us or less, or efficiency plummets and the processors aren't actually doing anything most of the time. Performance of my codes on the VT cluster would suck big time, and I could run rings around it with a much smaller box. Which is why you can't use a cluster to solve every problem and why they still make monolithic ultra-low-latency supercomputers.
So to give it proper perspective, clusters like the VT system are useless and not cost-effective for 90+% of the algorithms you could run. It's average interconnect latency is still pretty far on the "embarrassingly parallel" side of things, though much better than something like GigE, and the better interconnect will allow them to add more nodes for a given code before the fabric saturates. For the <10% of algorithms that run very well on them, it is well worth the money spent. Supercomputers in the more traditional sense will run most codes very efficiently, but are also more expensive than the more narrowly useful clusters.
Rocketman
Nov 16, 2003, 03:06 PM
Some one posted a reminder that the VT cluster is composed of complete consumer friendly CPU's with ALL associated accessories installed. That this fact will result in a great resale value in 2-3 years when they "sidegrade" to G5-2.8 ghz boxes :)
I would further point out that just like calculating the "cost of a lease" the life cycle cost of the hardware (including resale value) and support must be considered. I suspect if all of these factors are properly accounted for at the "point of sale" we will see the VT cluster being by far the lowest total cost cluster ever assembled on earth. And may maintain that record for quite some time.
As for black program computers, not only do they "not exist" :), but the security measures dramatically increase the costs of operation.
And finally my breaking comments about a 3U 7 blade Apple cluster, that will improve latency, density, and it will shock the "home built" compute world by it being available on the Apple Store to anyone, anywhere. It will be a race. He who chooses next day air and deep dish pizza wins:)
Rocketman
Classic
Nov 16, 2003, 03:20 PM
I wonder why Apple isn't advertising on the top500.org site. You'd think that they could have a banner ad that points out that their computer is third and cost much less than its competition....
tortoise
Nov 16, 2003, 03:22 PM
Originally posted by Wombatronic
You can rest assured that nearly all super computer tasks are designed to distribute well over a large cluster of nodes. Adding in more nodes should get you basically a linear speed-up until you start clogging up the network.
Very few supercomputer-type codes scale beyond 64 processors, even with something like an Infiniband interconnect fabric. Not surprisingly, people build these clusters to run codes that are already known to be in the narrow set of codes that WILL scale well to a large number of processors. Only a very tiny number of codes can scale linearly, the vast majority scale sub-linearly, and an infinitesimal number scale super-linearly (why super-linear is possible is left up as a technical exercise for the reader).
There are a huge range of supercomputing applications that will run slower on the VT cluster than on a more monolithic system with a fraction of the processors, as latency is the real killer in most cases. For example a cluster of a dozen custom Opteron-based boxes like this (www.octigabay.com) will outperform the VT cluster for many tasks. Not because a couple hundred Opterons is faster than a couple thousand PPC970s, but because the communication fabric is much faster and has much lower average latency in this case.
The only reason the Top500 list has so many clusters in it is because the benchmark they use is in the narrow set of codes that scale well on clusters. If they used a different code (e.g. the supercomputing codes I work on), the cluster benchmarks would drop like a rock and more monolithic systems would rule the roost. Clusters are excellent at what they can do, but there are many things in the supercomputing world that they scale very poorly on.
greenstork
Nov 16, 2003, 03:40 PM
Originally posted by Wombatronic
You can rest assured that nearly all super computer tasks are designed to distribute well over a large cluster of nodes. Adding in more nodes should get you basically a linear speed-up until you start clogging up the network.
People are not building 2000 node clusters to shave a few seconds off of the time it took the 1000 node cluster. They are doing it because nearly all linear algebra, discretized differential equations, and particle simulations can be nicely partitioned and effectively distributed over a gazillion [sp?] odd nodes.
The main tricky thing is to get the system to properly use the network, partitioning nodes and communication up so that folks aren't tromping all over each other. Properly laid out, the nodes spend very little time waiting for one another, and most of the time computing and sending/receiving.
Although I agree with you that getting these system up to speed means proper use of the network. The VT cluster is highly dependent on the interconnects, the topology of the network, the software that does the clustering, the task at hand, etc.
However I disagree that all supercomputing tasks are designed well to distribute, at least not in the sense of a traditional distributed computer models like Seti@Home and Folding, which was the comment I originally responded to.
Different types of problems (read code threads) require different computing needs. For example, if you have a problem that requires a great deal of memory, you may have to pool memory over different nodes. Then, bandwidth and speed of your networking play a much greater role in the speed of your supercomputer. In a distributed model, you wouldn't even have the ability to pool memory efficiently and thus, the speed type of computing is best accomplished on a supercomputer like Big Mac, Earth Simulator, etc. edit: or low latency (traditional) supercomputers computers as suggested above.
There are computing problems where multiple, numerous, nodes are fairly efficient. But these type of tasks require that you allocate data to a node infrequently. These types of computing tasks will scale with the number of processors but for those that require more frequent allocation between the nodes, it isn't a linear scaling.
To state the obvious, The tools that you use depend on the problem and in reality, neither one of us knows what VT is planning to use their computer for anyway. ;)
crackpip
Nov 16, 2003, 04:12 PM
Originally posted by Wombatronic
People are not building 2000 node clusters to shave a few seconds off of the time it took the 1000 node cluster. They are doing it because nearly all linear algebra, discretized differential equations, and particle simulations can be nicely partitioned and effectively distributed over a gazillion [sp?] odd nodes.
I disagree. The 1100 node supercomputer is more for allowing many different people to run jobs at the same time. The top500 list is great for measuring the overall performance of a cluster, but most likely no one will be running a 2200 processor job. Most jobs will probably be less than 32 processors.
Real world problems are generally not easily parallelized efficiently. It takes a tremendous amount of work and is often unnecessary. To site one of your examples, numerically solving differential equations (using finite difference methods) means communication between processes is required. Such communication increases with the surface area of the individual domains. Thus if you have a huge number of processes, you will have a lot of little domains that have, comparatively, large surface areas, and your efficiency drops rapidly. There is no way to get around this if your domain stays the same size. Obviously you can increase your domain, but often times a specific task doesn't need a larger domain. And you may not feel like waiting the extra time for a simulation over a larger domain to finish.
I work on a large, parallelized numerical weather model. Specifically I am developing a subroutine which more accurately calculates the effects of turbulence. Parallelizing is tough, dirty work, and often requires a major reorganization of the program. You have to have a good understanding of the theory of the problem, the details of numerical techniques, and the methods of parallelization. And you have to have a lot of time.
crackpip
simply258
Nov 16, 2003, 05:11 PM
you have got to see this
click on "G5 Ordering" and click on the 1st pic ..
http://don.cc.vt.edu/
she's ordering :D
for those that dont get it, look at what she's running
Sabenth
Nov 16, 2003, 05:33 PM
Great to see Big Mac at number 3
now will one of you nerds please exsplain what big mac has been put together to do what exsatly run the uni slush fund or is it to do reall work like solving cancer lol
it is really really really great to see that apple machines came in the list and blew away the rest of those machines in one go..
but still theres a lot to be done
well done vt
ddtlm
Nov 16, 2003, 05:54 PM
Classic:
I wonder why Apple isn't advertising on the top500.org site. You'd think that they could have a banner ad that points out that their computer is third and cost much less than its competition....
Do you honestly think anyone in the buisness doesn't know about the G5 cluster, its rank, its cost, and on exactly which algorithms it is effective?
Gyroscope
Nov 16, 2003, 06:42 PM
Originally posted by simply258
you have got to see this
click on "G5 Ordering" and click on the 1st pic ..
http://don.cc.vt.edu/
she's ordering :D
for those that dont get it, look at what she's running
:D :D :D :D :D :D
nathanziarek
Nov 16, 2003, 07:45 PM
Gotta love the Tunes...guess you have to get in the mood to place that kind of order.
Rocketman
Nov 16, 2003, 08:28 PM
Originally posted by simply258
you have got to see this
click on "G5 Ordering" and click on the 1st pic ..
http://don.cc.vt.edu/
she's ordering :D
for those that dont get it, look at what she's running
You can't order G5's on iTunes?
I like this one.
Too many boxes. (http://don.cc.vt.edu/g5modify/slides/IMG_2104.JPG)
ITR 81
Nov 16, 2003, 08:39 PM
I remember Va Tech saying they will be ordering another cluster in 2006. I'm hoping it's a G5 but if it is I wonder where the G5 will be by then? 4.5-5GHz??
Also I'm sure that once they Panthorized the the cluster it will show a improvement.
Mr.Hey
Nov 16, 2003, 09:34 PM
Voting negative?, here is a big negative for you people ..l., . Congratulations Apple and VT. :)
Powerbook G5
Nov 16, 2003, 09:37 PM
Who knows, by 2006, they may be able to order 1100 dual G6 PowerMacs and hit number one.
msandersen
Nov 16, 2003, 10:44 PM
Originally posted by simply258
you have got to see this
click on "G5 Ordering" and click on the 1st pic ..
http://don.cc.vt.edu/
she's ordering :D
for those that dont get it, look at what she's running
Damn! The secret's out! :( The cluster is really a giant jukebox, sharing a central repository of music with the entire campus :)
Maybe they're setting up a new iTunes music store? Order your Eminem while cracking a few genetic sequences on the side. Would you like fries with that?
Bet the visualizations would be smooth! Cover the walls with giant plasma screens :D Psychedelic head-trip. Who needs drugs?
Sheebahawk
Nov 16, 2003, 10:49 PM
Originally posted by pkradd
As stated above, Big Mac is not yet operating at full potential. It may be able to add 3 or 4 more teraflops. There will be another list published in 6 months so there should be some new computers named and Big Mac may or may not stay where it is. As far as the 3rd fastest computer in the world, I'd take that with a grain of salt. Many governments have computer systems that are not advertised (known) that may be as fast or faster then any of those posted on the list.
right, if they are not known, then how do you know about them? id take what you said with a shaker of salt. back your claim up with something please if you are serious about that assertion
PyroTurtle
Nov 16, 2003, 11:14 PM
lets not go into government computers...
considering the fact that there's no mention of any DoD computers, or other ciphering machine used by the US or UK...
like i said, lets not get into that subject
Powerbook G5
Nov 16, 2003, 11:20 PM
I hear that Apple actually has a computer running in a top secret underground lab that they stole from the Borg and use to reverse engineer their PPC chip designs. ;)
Gyroscope
Nov 16, 2003, 11:26 PM
Someone at IBM (maybe with little help from Apple), after sitting on PowerPC architecture for so many years and doing nothing much with it,has finally realized its potential over other CPU architectures(read x86) for high end computing. Now all of sudden there's alot of PPC buzz going around. Few days ago we hear about Microsoft, Nintendo going with PPC for their next gen gaming consoles. Super computer in size of one TV set, and confirmation that VT PPC 970 cluster is 3 rd in the world with such incredible price/performance ratio delivered by Apple. Can you believe it! :D Apple-Price/Performance. Today I read article, how SOI(Silcon on Insulator) is helping IBM to sample 90nm CPUs without any problems that are plaguing Intel's shift to 90nm.
Right now,future looks really promising.
- Windows NT: Windows Nice Try.
- Windows XP: Windows eXpress Problem
SiliconAddict
Nov 17, 2003, 12:51 AM
I didn't see the question posed, but I may have missed it. What is the possibility of these system being upgraded to faster CPU's next year. Obviously upgrading a CPU would be a whole heck of a lot cheaper then buying brand new boxes in a few years. Yes is a good bet the architecture for the PowerMac will have changed by 2006 but by next Spring? Summer? Fall? Wouldn't it be cool if they could order a few thousand G5 chips running somewhere from 2.x - 3Ghz and jump into second place? God knows the heat sink should easily handle any hotter chips in the future.
deadduck
Nov 17, 2003, 03:02 AM
I wouldn't bother Lets see, sell 1100 G5's to students in 2 years still quality machines, and place 1100 new machines with faster busses RAM etc. in place this could even be done over time with little or no downtime to the cluster much better for performance of the cluster and a great deal for students
theRebel
Nov 17, 2003, 09:49 AM
Originally posted by Rocketman
Some one posted a reminder that the VT cluster is composed of complete consumer friendly CPU's with ALL associated accessories installed. That this fact will result in a great resale value in 2-3 years when they "sidegrade" to G5-2.8 ghz boxes :)
3Ghz G5 systems should be available in 2004, so I doubt that 2-3 years from now that VT will be installing 2.8Ghz G5s.
udannlin
Nov 17, 2003, 10:44 AM
2200 chips and 17 teraflops.. thats awesome...
Sabbath
Nov 17, 2003, 11:04 AM
Arrgghh I dont know who makes the decisions on computer purchasing here at the university of nottingham (UK) we appear at 339 with 128 pentium 4 xeons @ 2.8GHz built this year. What a waste of money why didnt we just buy some Dual G5s. The resale value of the computers involved must be considered too but clearly isnt.
Its probably the same guy who keeps us running win98 on all the uni's computers, not offering 1 mac for use of any of the 18000 students, and offering no wireless networing at all. Idiot.
thatwendigo
Nov 18, 2003, 01:08 AM
Originally posted by PyroTurtle
lets not go into government computers...
considering the fact that there's no mention of any DoD computers, or other ciphering machine used by the US or UK...
like i said, lets not get into that subject
Oh, really?
The number 2, 18, 19, 20, 27, 34, 56, 57, 67, 71, 113, 132, 147, 15, 169, 197, 285, 290-293, and 296 machines all belong to US Government owned facilities that do military work or are military installations.
Also, the UK's Atomic Program is #37.
That's a casual scanning, by the way. I'm sure I missed some.
stoid
Nov 18, 2003, 01:22 AM
Originally posted by Sabbath
Its probably the same guy who keeps us running win98 on all the uni's computers, not offering 1 mac for use of any of the 18000 students, and offering no wireless networing at all. Idiot.
We are touted as the "Electronic Campus Since 1987" and we are using about 2500 two year old 900MHz Gateway boxes and 4 G4 towers of some variety in the video editing lab. Can you check out laptops for use from the campus library only to find out that they are 3 inch thick bigger than your Physics book IBM ThinkPad's from 1999. Maybe that's why they don't have wireless networks on the campus, their portable technology is so hopelessly out of date that only those with their own computers could use it.
Powerbook G5
Nov 18, 2003, 09:04 AM
We don't have wireless here, heck, we didn't even have internet at all until this semester and it's a terribly unreliable DSL network that literally goes down two are three times a day on average. We just "upgraded" all of the Dell boxes in the library to some variety of Pentium 4 and from Windows 2000 to Windows XP Pro (a downgrade in my opinion). Every classroom now has a Dell and a nifty Hitachi A/V projector which 98% of the teachers never utilize, and the only Macs we have are a few original iMacs (hockey puck mice and all), a few Yikes G4s, and OS X 10.0 server running on them. It's pretty sad to say the least.
PyroTurtle
Nov 18, 2003, 01:53 PM
yes, there are some government computers listed...
my point was all the code cracking machines aren't listed. none of them.
as a side note, i think they should list the seti@home and folding@home programs on that list ;)
vBulletin® v3.8.6, Copyright ©2000-2012, Jelsoft Enterprises Ltd.