UMaine Building 256 Node G5 Supercomputer

WinX · Jul 14, 2004

iChan said:
seeing as the earth simulator costed $300m+, isn't it just a matter of time before some one out there plops down $100m+ on an xserve Supercomputer? I mean, Apple supercomputers scale linearly and therefore, the extra money put into an XServe cluster will have a direct correlation with the increase in power.

let's see... going on VT's original "Big Mac" which cost $5.2M, they got 10.28TFlops.

The Earth Simulator cost $300m+ and "only" produced 35.86TFlops.

therefore, if someone out there spent $300m on an apple cluster, then they can probably get 593TFLOPS!!!!! WHOAH!!!!! that would OWN anything out there in the world!!!
what does every think...?? should it be done? is my reasoning incorrect?

A major difference between a real supercomputer like the Earth Simulator and a cluster built as a collection of inexpensive machines is data bandwidth.

The 1GHz FSB "elastic interface" between a Xserve G5's 2GHz CPU and its memory controller has an effective bandwidth of about 3.2GBytes/sec. That bandwidth is only sufficient to send about 400 million 64-bit floating-point numbers per second to each CPU for number crunching -- i.e. about 400Megaflops per CPU, or 0.8GFlops per dual-CPU 2GHz Xserve.

As for inter-node networking, the Virginia Tech cluster's Infiniband interfaces have a raw bandwidth of 10Gigabits/sec.
The Army's MACH-5 Xserve cluster uses a networking technology that is even slower -- Gigabit ethernet with a raw bandwidth of 1Gigabits/sec.

With the Earth Simulator's 640 eight-CPU nodes, the memory bandwidth to each CPU is 32 Gigabytes/sec -- i.e. an order of magnitude larger than an Xserve's. In fact, the E.S.'s 10 Terabytes of shared memory has a speed comparable to a 2GHz G5 CPU's on-die L1 cache !

The memory bandwidth is sufficient to feed 4 billion 64-bit floating-point numbers per second to each one of the E.S.'s 5,120 CPUs -- i.e. to sustain about 20 Teraflops with central memory-based data. As a comparison, a 1100-node dual 2GHz Xserve cluster can sustain a memory-based throughput of about 0.88 Teraflops. The two architectures are thus not really in the same league.

Each link in the network interconnecting the E.S.'s 640 nodes has a raw bandwidth of 128 Gigabits/sec, which is thus significantly faster than the VTech cluster's Infiniband.
It's also interesting to note that each link of that custom network has a bandwidth that is several times that of an G5 Xserve's CPU-RAM bandwidth !

Furthermore, on a thousand-node switched Infiniband or Ethernet network, data packets must statistically travel through numerous Infiniband or Ethernet switches before reaching their destination. The traffic will thus experience significant latency. The E.S.'s network, on the other hand, is fully meshed, and each node can thus communicate with any other node in a single low-latency network hop !

Thus, for problems that are not embarassingly parallel, and which require a significant amount of data exchange between the CPUs or across nodes, or for problems that require crunching through data sets too large to fit in a CPU's small L1 cache, the Earth Simulator might be an order of magnitude faster than a PC/Mac-based cluster whose theoretical peak performance, based on CPU speed alone, might appear to be superficially comparable.

army_guy · Jul 15, 2004

Exactly WinX, for supercomputing applications its not the processor performance which matters, its the bandwidth regardless of what people think. This is what CRAY concentrates on, memory/IO bandwidth, interconnects etc.. Another reason why redstorm will be a hit taking the advantage of clustering, the Opteron architecture and alleviating most of dissadvantages associated with clustering.

Like I said PROCESSOR PERFORMANCE IS NOTHING WITHOUT THE BANDWIDTH.

clusterman · Jul 15, 2004

Actually, gents. It's ENTIRELY problem-dependent.

Many people were bashing the design of MACH5 as shortsighted, being built without any form of "traditional" top-end interconnect such as Myrinet, Infiniband, Quadrix, etc.), but from our research, we found that the particular code that MACH5 was built for only communicates on the order of 0.01%-0.1% of the time. That's pretty darned low. And the bandwidth required, for a given problem, also varies, but cluster-wide we've seen bandwidth in 50MB/s (where each machine has a 100Mb/s network connection). When the MACH5 machine is built, each node will obviously have 10x the bandwidth available, with roughly the same latency as Fast Ethernet. This will be interesting to see, as this could quite possibly be the largest machine for quite some time, that utilizes traditional COTS-based networking.

Our first-gen cluster was built with Myrinet, and we determined that it was a complete and total waste of money for our problem. With costs ranging upwards of near 50-60% of total machine cost for these top-end interconnects, you want to be damned well sure that you need it.

Now, the next-gen COLSA machine is rumored will be multi-purpose. AFAIK, MACH5 is single-purpose.

WinX · Jul 15, 2004

clusterman said:
Actually, gents. It's ENTIRELY problem-dependent.
[..]
Now, the next-gen COLSA machine is rumored will be multi-purpose. AFAIK, MACH5 is single-purpose.

As far as specialized, single-purpose machines are concerned, one interesting architecture is the GRAPE GRAvity piPE, optimized for Coulomb force simulations, as occur e.g. in N-body problems with gravity, such as galaxy evolution.

Note that "[..] MD-GRAPE2 can also be used for other problems, such as incompressible fluid simulations with the particle-vortex method, compressible fluid simulations with smoothed particle hydrodynamics, plasma physics with magnetic and electric forces, magnetics, and astrophysics with gravity. Algorithms that have been used include hierarchical tree codes, multipole methods, the Ewald method, and Particle-Particle/Particle-Mesh methods."

http://www.research.ibm.com/grape/grape_mdgrape2.htm

With a peak speed of 48 Teraflops, the GRAPE-6 computer at the University of Tokyo is probably the fastest supercomputing cluster in the world today.

http://www.ids.ias.edu/~piet/act/comp/hardware/

clusterman · Jul 15, 2004

More nodes delivered today

Hiya folks,

Not sure if any of you are still following our build, but we've gotten another 30 nodes today, another 30 tomorrow, and another 30 on Monday for sure.

We've got updated pics, camera angles on our site so you can follow the build.

5300cs · Jul 15, 2004

This is very cool for Apple.

I don't know why, but as soon as I read that the army was funding/supporting/whatever, this thing, I immediately though of Ed Rotberg quitting Atari when he was forced to make a military version of Battle Zone.

macsrus · Jul 16, 2004

clusterman said:
Actually, gents. It's ENTIRELY problem-dependent.

I Agree

Computer Systems are tools....

Super Computers and Clusters are tools....

Scientists and Computer Engineers who use and design these tools.... do so to solve problems....

It doesnt matter whose system is the fastest.... that is pure bravado...

What matters is ....what problem do you need to solve....
Once you know the characteristics of the the problems you are trying to solve....
Then design a system that best solves that/those problems.... In the most efficent and cost effective manner....

This is exactly why clusters have become so popular lately....

For many problems..... CFD.. Genome Research... Protien folding.. etc.
Clusters simply provide the most performance for the dollar spent...

While Large shared memory systems like the Earth Simulator do have a place... And do solve some problems that clusters are less suited for...
Suprisingly they also are poor performers at the tasks clusters excel at.

We specifically tested on IBM Regattas...and a SGI Altix.... which are both large shared memory machines with very fast crossbar interconnects.... heck the SGI Altix's Numa memory architecture is amazingly fast....

And for our code and problems...They were both out performed by a Opteron Cluster....
Not to mention The price difference.

So people dont get hung up on machine specs.... Remember computers are tools...
Use what best suits your needs/pocketbook

army_guy · Jul 16, 2004

Opterons are cool

only a few weeks left for this.

Engineered for EDA Applications 32/64-bit Windows and LINUX (REDHAT AS 3)
QUAD OPTERON CPU's
64GB DDR 400 ECC REG
0.6TB SCSI 320
NVIDIA Quadro 4000 AGP

This is a workstation....

clusterman · Jul 17, 2004

We looked at opterons VERY closely. Turns out, we could have bought a TON of 'em (more than we could have cooled). But the mac was faster on our given problem so there you have it.

Plus, the mac is much, much cooler than any opterons we've seen.

~Shard~ · Jul 17, 2004

clusterman said:
We looked at opterons VERY closely. Turns out, we could have bought a TON of 'em (more than we could have cooled). But the mac was faster on our given problem so there you have it.

Plus, the mac is much, much cooler than any opterons we've seen.

There you have it, straight from the horse's mouth! Macs are cooler than Opterons, in more ways than one...

clusterman · Jul 17, 2004

~Shard~ said:
There you have it, straight from the horse's mouth! Macs are cooler than Opterons, in more ways than one...

Cooler as both temperature, and style!

Search

Search

UMaine Building 256 Node G5 Supercomputer

WinX

macrumors newbie

army_guy

macrumors regular

clusterman

macrumors newbie

WinX

macrumors newbie

clusterman

macrumors newbie

5300cs

macrumors 68000

macsrus

macrumors 6502

army_guy

macrumors regular

clusterman

macrumors newbie

~Shard~

macrumors P6

clusterman

macrumors newbie

Our Staff