offline thread about clusters:
Hey everyone....
Our cluster is no 65 on the list... and yes, the timing is very important. We only had it on for about a week when the deadline for the list came around. We didn't have anything optimized properly, and many of the nodes were mis-wired (ie, the management consoles are on 100Base-T, and the compute nodes are on GigE, but many of them were plugged into the 100 switches instead of the gig switches.
Anyway, we got on at 65, although i have been told that we are now benchmarking fast enough to be in the top 30...
Anyway, i mentioned this vt rumor to some of the guys at work and this is was their responses. Thought you might be interested:
---
This puzzles me - they will have approx 2200 x 2GHz CPUs and we
have 1024 x 2.? GHz CPUs so they will have approx twice the raw CPU
power we have._ Why then do they think they are going to get 5 times
the max throughput we get - 10 Tflops vs approx 2 TFlops?
Is someone on crack or is it all down to the interconnects
(Infiniband, whatever that is)?_ It seem incredible that all that
speedup would be simply due to their CPUs being 64bit while ours
are 32bit._ I'll be very interested to see how this project pans out
and whether their cluster ever gets even close to that 10 Tflop number
they have mentioned.
John (the pessimist)
---
and the response from our cluster admin:
---
The interconnects play a HUGE role in achieving those sorts of numbers. The latency with Gig E is much more than myrinet or any other highspeed interconnect._ I can achieve about 500 G/Flop with the 128 myrinet nodes. It takes over twice that number of Gig/E nodes to produce similar numbers. Also the G5 floating point performance is much higher than the Xeon processors. Also being able to address more memory via 64bit is adventageous as well._ The early Opteron benchmark numbers show this very well.
For your reference:
http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_8796_8800~70045,00.html
http://www.apple.com/powermac/
---
John's point is well taken but Virginia tech_will have much more options with a 64 bit system than people limited to 32 bit, re: memory access per_node._ However,_note that Cray is now using the AMD chip and a newly-developed multiCPU backplane to put Sandia Labs back into Top500_contention by creating a system_like TimeLogic and Paracel - 32 bit, standard cheap memory and 2G per CPU but very few interconnects, everything is on 16 enormous backplanes with full-speed access to the memory pipeline, and they have the potential to exceed 32K 32-bit_processors and 64 TB of RAM_per_system (Paracel was at 18K CPU's at their last iteration) making Cray the lowest cost per clock cycle at roughly $16M (3x our price) per system and potentially exceeding 20 TF (10x our performance and 2x Virginia Tech but more scalable)._ Also, the Cray system could be an Unreal server without modifications.