iChan said:seeing as the earth simulator costed $300m+, isn't it just a matter of time before some one out there plops down $100m+ on an xserve Supercomputer? I mean, Apple supercomputers scale linearly and therefore, the extra money put into an XServe cluster will have a direct correlation with the increase in power.
let's see... going on VT's original "Big Mac" which cost $5.2M, they got 10.28TFlops.
The Earth Simulator cost $300m+ and "only" produced 35.86TFlops.
therefore, if someone out there spent $300m on an apple cluster, then they can probably get 593TFLOPS!!!!! WHOAH!!!!! that would OWN anything out there in the world!!!
what does every think...?? should it be done? is my reasoning incorrect?
A major difference between a real supercomputer like the Earth Simulator and a cluster built as a collection of inexpensive machines is data bandwidth.
The 1GHz FSB "elastic interface" between a Xserve G5's 2GHz CPU and its memory controller has an effective bandwidth of about 3.2GBytes/sec. That bandwidth is only sufficient to send about 400 million 64-bit floating-point numbers per second to each CPU for number crunching -- i.e. about 400Megaflops per CPU, or 0.8GFlops per dual-CPU 2GHz Xserve.
As for inter-node networking, the Virginia Tech cluster's Infiniband interfaces have a raw bandwidth of 10Gigabits/sec.
The Army's MACH-5 Xserve cluster uses a networking technology that is even slower -- Gigabit ethernet with a raw bandwidth of 1Gigabits/sec.
With the Earth Simulator's 640 eight-CPU nodes, the memory bandwidth to each CPU is 32 Gigabytes/sec -- i.e. an order of magnitude larger than an Xserve's. In fact, the E.S.'s 10 Terabytes of shared memory has a speed comparable to a 2GHz G5 CPU's on-die L1 cache !
The memory bandwidth is sufficient to feed 4 billion 64-bit floating-point numbers per second to each one of the E.S.'s 5,120 CPUs -- i.e. to sustain about 20 Teraflops with central memory-based data. As a comparison, a 1100-node dual 2GHz Xserve cluster can sustain a memory-based throughput of about 0.88 Teraflops. The two architectures are thus not really in the same league.
Each link in the network interconnecting the E.S.'s 640 nodes has a raw bandwidth of 128 Gigabits/sec, which is thus significantly faster than the VTech cluster's Infiniband.
It's also interesting to note that each link of that custom network has a bandwidth that is several times that of an G5 Xserve's CPU-RAM bandwidth !
Furthermore, on a thousand-node switched Infiniband or Ethernet network, data packets must statistically travel through numerous Infiniband or Ethernet switches before reaching their destination. The traffic will thus experience significant latency. The E.S.'s network, on the other hand, is fully meshed, and each node can thus communicate with any other node in a single low-latency network hop !
Thus, for problems that are not embarassingly parallel, and which require a significant amount of data exchange between the CPUs or across nodes, or for problems that require crunching through data sets too large to fit in a CPU's small L1 cache, the Earth Simulator might be an order of magnitude faster than a PC/Mac-based cluster whose theoretical peak performance, based on CPU speed alone, might appear to be superficially comparable.