Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

G4scott

macrumors 68020
Original poster
Jan 9, 2002
2,225
5
USA_WA
UT (my university :D ) has just announced that they now have the fastest supercomputer in Texas, called Lonestar, capable of 3.7 Teraflops.

Unfortunately, this wonderful supercomputer was built by Cray, using Dell components (Dell didn't really build it, Cray did...) What's worse, is that this supercomputer, with 600 3.06Ghz Xeon processors.

Now, doing some simple math, if a 2200 processor G5 cluster can put out at least 30 Teraflops (I can't be sure, because I don't know how many gigaflops a dual 2ghz G5 puts out), a G5 cluster (which would also cost less), would be at least twice as fast. It would also be able to handle 64bit calculations, which this new supercomputer can't...

What are your opinions about a Dell supercomputer?

Here are some links
http://www.tacc.utexas.edu/
http://www.tacc.utexas.edu/general/press/20031003_01.php
 
Re: University of Texas announces new supercomputer

Originally posted by G4scott
UT (my university :D ) has just announced that they now have the fastest supercomputer in Texas, called Lonestar, capable of 3.7 Teraflops.
Something is wrong here, a boast from the Texans that only compares themeselves to other Texans. :eek:

And of course it'll be a Dell...

:p
 
i guess everything isnt bigger in texas. when was it built?

iJon
 
It was just built recently.

I'll say it. Texas sucks as far as super computers go, although we have IBM and dell...

I've seen some other supercomputers on campus, and my G4 laptop is actually faster than a couple of them...
 
haha, arent doing to good in football either. just givin ya a hard time.

iJon
 
Originally posted by G4scott
It was just built recently.

I'll say it. Texas sucks as far as super computers go, although we have IBM and dell...

I've seen some other supercomputers on campus, and my G4 laptop is actually faster than a couple of them...

Hehe.

G4> SuperComputer!

W00 ;)

Glad they have a new computer to play with.

BTW, you can't just mutiply, this is the real world not theoretics.
 
Originally posted by MrMacman
BTW, you can't just mutiply, this is the real world not theoretics.

Still...

600 x 3 GHz = 3.7 Teraflops
1100 Dual (2200) 2 GHz = 30 Teraflops?

Doing the math... yep, twice as fast (4x, price wise).

Theoretical, shmeoretical. ;)
 
Originally posted by iJon
haha, arent doing to good in football either. just givin ya a hard time.

iJon

Our season record is 4-1, and we just beat KSU. We're also currently ranked 13 (which should go up). You might be thinking of Texas a&m...

Anyways, back to this supercomputer... I think it's a major publicity stunt for dell...

oh, and as for the theoretical performance of Virginia Tech's cluster, I'm not sure of what it'll actually be. I've been trying to find out how many GFlops a dual 2ghz G5 can put out, but I haven't been able to find actual numbers, so I have to compare floating point benchmarks...

Everything I'm saying here is theoretical, but consider this. A single 2ghz G5 processor is about 1.3x faster than a single 3.06ghz Xeon. If a 600 processor Xeon puts out 3.7 TFlops, a 600 processor G5 should put out somewhere around 4.8 TFlops, which is about .00819 Tflops per processor. In a 2200 G5 cluster, you would be at around 17 TFlops. So my initial assumption of 30 TFlops was wrong, I'll admit. But a 2200 Processor G5 cluster would be about 4 teraflops faster than a 2200 processor Xeon. Of course, the fastest Xeon supercomputer has 1024 1.8ghz Xeon's, and puts out 2.2 TFlops. The next fastest has only 600 2.4 ghz Xeon's, and is built using Myrinet (the same that this new cluster is being built on), but still only puts out 2.004 TFlops. What's even more amazing, is that a Dell PowerEdge 2650 (which is what these computers are) cluster with 1024 3Ghz Xeons puts out only 1.068 TFlops, yet has a theoretical performance of 6.144 TFlops. So while the theoretical performance is not always, and almost never the actual real world performance, it can give you a good idea about how a computer will stack up against it's competition.

I just think that our university could've gone with a better choice of supercomputer cluster, but then again, you guys want your G5's to ship sometime this year :D
 
Re: University of Texas announces new supercomputer

Originally posted by G4scott
Now, doing some simple math, if a 2200 processor G5 cluster can put out at least 30 Teraflops (I can't be sure, because I don't know how many gigaflops a dual 2ghz G5 puts out), a G5 cluster (which would also cost less), would be at least twice as fast. It would also be able to handle 64bit calculations, which this new supercomputer can't...

What are your opinions about a Dell supercomputer?

Not sure about how a single processor 2 GHz G5 would score in Linpack but a 3.06 GHz Pentium 4 scores between 5-35% worse than a 1.7 GHz Power4++ depending on the problem size.

http://www.netlib.org/benchmark/performance.pdf (see page 6)

Name: Intel Pentium 4@ 3.06 GHz
n=100: 1414 MFLOP/S
n=1000: 2880 MFLOP/S
Theoretical Peak: 6120 MFLOP/S

Name: IBM Power 4@1.70 GHz (IBM P-Series 655)
n=100: 1486 MFLOP/S
n=1000: 3884 MFLOP/S
Theoretical Peak: 6800 MFLOP/S

Name: Intel Itanium 2@1.50 GHz (HP RX-2600)
n=100: 1635 MFLOP/S
n=1000: 5303 MFLOP/S
Theoretical Peak: 6000 MFLOP/S

So the question will be will be how close the 2 GHz G5 performs relative to the 1.7 GHz Power 4+. On one hand, the G5 is clocked about 18% faster and will have a slightly faster bus. On the other hand, the Power4++ has two cores, and a huge L3 cache (most programs will completely fit in this cache).
 
G5 Cluster vs Xeon Cluster

Thus far, nobody has identified what kind of FLOPS they are talking about here (LINPACK flops or peak flops???). For peak, the 2GHz G5 should hit 8Gigaflops for double-precision multiply-add instructions and only 4 gigaflops if you can't make use of the multiply add (it has 2 FPUs that exec 1 FP op per clock cycle).

The x86 with SSE2 has a similar capabilities. You would use the SIMD unit to compute 2 DP flops per cycle. However, it cannot perform a fused multiply-add. So its peak performance is 6Gigaflops double-precision. The caveats here are of course
1) SSE2 requires that you execute two of the same instructions simultaneously (ie. two adds or two multiplys, but not any mix of instructions)
2) For the G5, not every floating point operation is going to be a multiply-add.
3) The G5 has 32 floating point registers to work with whereas the SSE2 offers effectively half that much

In practice, the peak performance is only usable if you have computational kernels with extremely high computational intensity (many flops per byte of data loaded). The NEC SX-6 (a vector supercomputer with 8Gigaflops/processor) offers about 8bytes/flop performance -- that is, it can load 8 bytes of operands for each flop it can execute (it also has nearly 3k of floating point registers). By contrast the G5 and P4 are offering .5 bytes/flop. So we typically see low efficiency (15% or so) for the microprocessor based machine running real scientific workloads whereas the Japan Earth Simulator and similar machines regularly hit >50% efficiency (50% of their peak rated performance).

For the Linpack benchmark used for the Top500, you can do a lot to fit the problem size per-processor so that it will stay mostly cache-resident. So the interconnect will be somewhat critical to the scoring. I have little doubt they can hit a good fraction of the peak rated performance regardless of what the usefulness is for a typical scientific workload.

The very serious issue with these clusters is going to be parallel I/O performance. NFS simply doesn't cut it for large scale clusters. Linux has many options available including Lustre, IBM's Linux implementation of GPFS, and even Clemson's PVFS. None of these parallel filesystems have BSD/Darwin ports as of yet. NFS servers are simply inadequate for clusters once they scale beyond 128 nodes, so I assume Cray will be using Lustre, given that is the filesystem they will be using for ASCI Red Storm (a cluster of AMD x86-64/Opteron processors with a custom interconnect).

Until someone ports Lustre or GPFS to Darwin, its going to be difficult to run exceptionally large G5 clusters as a production resource despite the superior CPU/Memory subsystem architecture. Until then, I'm not certain if people building massive G5 clusters have the best interests of production scientific users in mind (its more of a CS/EE department experiment). When/If someone does port some serious parallel I/O-capable filesystems to the G5, it will be an overwhelming favorite in the cluster business.

Since IBM ported their XL Fortran and XL C compilers to the G5, its just about as good an environment as I could expect on the largest IBM-constructed ASCI computers. Its just as shame the cluster filesystem options are so poor.
 
And in other news...

Because of Mack Brown's inability to win the Red River Shootout, the UT football team will take advantage of the new super computer it has built to make play calls during the game this weekend.

:) ;)

Regards,
Gus
 
Zephyr Aardvark:

Did you post that long explanation right after me? I've explicitly stated the sustained Linpack performance of several performance in different problem sizes as well as the theoretical peak performance of each processor.
 
It seems as if there has been some contreversy over this new supercomputer already. Apparently, MacNN reported that this thing costed 38 million dollars. It went to other stuff, like:

4 endowed faculty chairs and research funds
The completion of the 4th floor of the ACES buliding (which makes sense now, because when I went in that building, I noticed that the evelators had plywood all over them, and that the 4th floor was off limits...)
Two supercomputers (the Cray-Dell, and an IBM Power4 Cluster)
two massive storage systems
and stuff to increase UT's networking infrastructure.

This 38 million is to be spent over 5 years...

Now, they still could've gotten a much faster supercomputer if they went with G5's, though...

Alas, I probably won't get to touch this thing for another three years... Oh well...
 
Yeah, the VT supercomputer costs posted on MacNN only included the hardware, most of the expenditures are going to come from other things like operating costs.

I've also realized that the Virginia Tech supercomputer consists of 1100 machines each with 2 processors in them (for a grand total of 2200 processors). Which would only make sense since
17.6 Tflops=8 Gflops (peak of single 2 GHz G5)x2200
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.