View Full Version : More Virginia Tech Cluster Details
MacRumors
Sep 3, 2003, 01:57 PM
A few articles provide some more details on Virginia Tech's upcoming PowerMac G5 cluster.
TechNewsWorld (http://www.technewsworld.com/perl/story/31485.html) provides quotes from Dean Hassan Aref that the total price tag on the cluster "is probably a factor of 10 lower than a machine in this class in the past".
The PowerMacs will be running Mac OS X, and as mentioned before, will be linked with hardware from Mellanox and Cisco.
Roanoke Times (http://www.roanoke.com/roatimes/news/story154706.html) reports that the cost for the project will be $5.2 million over the next five years, and that they are trying to get the system setup by October 1st 2003 to be considered in the next Top 500 Supercomputer rankings.
An interesting tidbit from CollegiateTimes (http://www.collegiatetimes.com/index.php?ID=1748) brings the total weight of the PowerMacs in at 19.25 Tons -- and simply moving the hardware will take about six days with 15-20 volunteers.
evolu
Sep 3, 2003, 02:02 PM
I'm glad they're running OS X contrary to past reports...
kb9000
Sep 3, 2003, 02:03 PM
Can I be a volunteer for moving day??
JoeRadar
Sep 3, 2003, 02:17 PM
During a recent analysts meeting HP's Fiorina said the mid-range server market was essentially going away, which is hurting HP's numbers. Apparently the high-end server market is relatively stable, with the growth is going towards the low end servers.
Apple seems to have hit the server market at the right time. While the VaTech cluster is PowerMacs and not Xservers, this should help give Apple a lot of street cred.
Regarding the clusters, the currently hot topic, I don't see a lot/any of them based on Windows. It looks like a two horse race for the growth area in the server market: Linux and Macs.
trog
Sep 3, 2003, 02:18 PM
Yeah, I'll help move too! (I don't think they'd miss just ONE of those ;)
ghutchis
Sep 3, 2003, 02:35 PM
No, there aren't many Windows clusters for a few (hopefully) obvious reasons:
- per-CPU/per-seat pricing
- reliability
- UNIX
Pricing is one of the biggies, IMHO. At least right now, Microsoft doesn't have any sort of "cluster pricing" or "unlimited node" deal. Since the whole point of clustering is to have a low-price supercomputer, you don't want to pay much for the OS. Apple has fortunately figured this out.
One can argue reliability, but there's certainly the perception that Windows has lower reliability in the clustering community.
UNIX? Well... most of the uses for clusters are in scientific computation, and the number-crunching stuff has long been written for UNIX or UNIX-like platforms. Why do you want to port your simulation program to Windows? So Linux or Solaris or Mac OS X are great cluster solutions--you can often just recompile the package and it "just works."
I think this is a great market for Apple to push. I've seen a lot of interest in the G5 and if they can undercut Itanium or Opteron clusters, they'll be in a great position. (Yes, there are clusters around me which we'd love to have >4GB per node for performance benefits.)
-Geoff
Macmaniac
Sep 3, 2003, 02:35 PM
I just want to see all of those computers running at once. I bet its nice and warm in there:)
shadowfax
Sep 3, 2003, 02:39 PM
Originally posted by evolu
I'm glad they're running OS X contrary to past reports... as am i. i think this is very good for apple.
i wonder if any other places will be clustering these on this scale anytime soon?
york2600
Sep 3, 2003, 02:44 PM
If you want to use Windows on a cluster your going to need Windows Server 2003 Enterprise of Datacenter Edition. That's going to run you $4,000 retail with 25 CALs. Granted you'd be able to get a Select license and bring the cost down quite a bit if you were buying say...1100, but you would still be paying an insane amount for an OS that you would be working harder to use if your environment. While *nix is a tougher setup $4,000 X 1100 can certainly buy a techie to do the admin work.
stoid
Sep 3, 2003, 02:44 PM
Only one question left?
What the hell are they going to do with that much CPU power?!?
Macmaniac
Sep 3, 2003, 02:47 PM
Why their going to discover the meaning of life! Also I'm sure they will soon adress the question of are we alone?
richters
Sep 3, 2003, 02:48 PM
Originally posted by Macrumors
-cut- ... brings the total weight of the PowerMacs in at 19.25 Tons -- and simply moving the hardware will take about six days with 15-20 volunteers.
The Apple advertised weight for the units is 17.8 kilograms each, times 1100 units: yes, that is about right.
However, moving 1100 units over six days, that is about 183 units per day, with even only 15 volunteers, that is about 12 units per person per day.
I figure they could do much better than that.
I think I can easily move many more units in a day, by using some suitable transportation tools.
Come on guys, apply some transportation logistics.
Shrike_Priest
Sep 3, 2003, 02:50 PM
if Apple just launch a cheap G5 Xserve, they will OWN this market.
I'm thinking this is a great way to get a foot in the door. This is a market they have a much better chance at raising their market share considering they're competitively price here, which they are not in the desktop market.
Lots of Mac servers/clusters means that a lot more people will hear about the platform, and will think that it's more powerful than PCs. And so on. After a while, that'll migrate to the desktop business.
Chomolungma
Sep 3, 2003, 02:53 PM
Originally posted by stoid
Only one question left?
What the hell are they going to do with that much CPU power?!?
an example,
I use a mac to look at small DNA sequence data matrix...looking for pattern in large DNA sequence matrix would require exponential increase in time...The analysis I ran a few weeks ago took 6 days to complete using a single G4 (~70 species, 1800 bp long)...One example of many in the life sciences
-Chomo
sososowhat
Sep 3, 2003, 02:57 PM
Much as I love my Mac...
It just ain't true that an 867 MhZ G4 is 5x more powerful than a 2.0GhZ Pentium 4. Nor do I think it's true that a 2Ghz DP G5 is 10x as powerful as the P4.
Something's wrong with their chart.
http://www.collegiatetimes.com/index.php?ID=1748
Waluigi
Sep 3, 2003, 02:57 PM
Originally posted by richters
However, moving 1100 units over six days, that is about 183 units per day, with even only 15 volunteers, that is about 12 units per person per day.
I figure they could do much better than that.
I think I can easily move many more units in a day, by using some suitable transportation tools.
Come on guys, apply some transportation logistics.
That does seem quite low? I've set up many large computer labs with just 3 people (although not quite that large) and the most time consuming part was putting in all the wires, and making them neat. It shouldn't take 6 days to move the boxes! However, if they aren't paid, they might not really want to push it...but still 12 boxes in a day? This will never get up by October 1st if they continue to snail along at this pace. Oh yea, they BETTER recycle all the cardboard boxes; otherwise it might just be ridiculous.
--Waluigi
Chealion
Sep 3, 2003, 03:05 PM
Is the 6 units a day including set up? Or is that the shifts are only an hour long?
johnnowak
Sep 3, 2003, 03:08 PM
Based on gigaflops their chart is 100% accurate.
Well we all know about the gigaflops myth
esheep2001
Sep 3, 2003, 03:16 PM
Well I have to say I always had my doubts about the reports saying they'd be running an OS other than OS X. Not only that but I reckon the recent spurt in Panther activity is to try and get that out to them before they finish the hardware plumbing.
e.
tiktokfx
Sep 3, 2003, 03:21 PM
WRT to moving/installation time, considering that Tech is apparently using some new sort of cooling rack, it's possible that there will also be work involved in mounting customized rack hardware to the cases, which could very well involve drilling and whatnot. That would drastically increase the time from what it takes to just pull a CPU out of the box and plug it in.
sanaco
Sep 3, 2003, 03:25 PM
I hope that VT was able to cut a deal and save some money on keyboards and mice and did not have to get them included with the G5. If not are we going to see 1,099 keyboards and mice on ebay next week?
Doctor Q
Sep 3, 2003, 03:48 PM
Maybe they'll use all these Macs for Folding and SETI, and credit the MacRumors teams if we ask them. Seriously, I assume that these Macs are not there for a specific project but to serve all of Virginia Tech's supercomputer needs.
beefstu01
Sep 3, 2003, 04:03 PM
I'm sorta wondering how they're going to do this with OSX. Not that it's a bad operating system, but it's quite graphics driven. That takes up a little processing power in itself, multiply that by 1100 and that's a huge loss. Is there any way to run OSX without the GUI? This is why Linux is mainly used for clusters- there is the option of running Linux without X.
shadowself
Sep 3, 2003, 04:14 PM
Originally posted by Macrumors
The PowerMacs will be running Mac OS X, and as mentioned before, will be linked with hardware from Mellanox and Cisco.
This is why it will take six days or more to install everything. It is not just setting up Macs on a simple gigabit Ethernet network.
Roanoke Times (http://www.roanoke.com/roatimes/news/story154706.html) reports that the cost for the project will be $5.2 million over the next five years...
Sorry, I don't believe this for a second. Not one second. The total up front cost hast to be at least $4 million (computers, additonal RAM, networking cards, routers, cabling, racks, cooling systems, etc., etc.) and probably much more. Running it for a cost of $240,000 a year or so is just out of the question. Five high end computer operators will cost much more than that. And that gives only ONE operator on a 24/7/365 basis. (It takes about 5 people to keep one slot fully staffed on a 24/7/365 basis.) Likely on a machine of this size the staff to maintain it will total more than 15 people (1 to 5 people on site at any given moment).
cryptochrome
Sep 3, 2003, 04:19 PM
Originally posted by beefstu01
Is there any way to run OSX without the GUI?
You mean Darwin? Even if some of the additional OS X specific system services are required, I'm sure they can work something out. No need to be so GUI heavy. On a related note, why would such a task require a graphics card? There's a lot of room for tweaking.
I just felt funny and thought I'd calculate the Rpeak-Value with the simple equation
FPops/Cycle * MHz * Number of CPUs
(try it, it works with all the Top500 Rpeak-Values! Xeon and Alpha have 2 FPops/cycle, Power3/4 and Itanium have 4!)
..we end up with 17.6 TFLOPS Rpeak (4 FPops/Cycle * 2 GHz * 1100 Macs * 2 CPUs/Mac), and that is JUST the FPUs without any Altivec!
yes, the G5 has, just like Power3/4 4 FPops/cycle, even though it only has 2 FPUs, because the PPCs combined Multiply-Add-Op accounts for 2 FPops! ;-)
This puts it pretty close to the Rpeak of the 8192 1.25GHz Alpha-Cluster on Place 2 (20.4 TFLOPS)! Now we just need to wait how much Rmax they get out of the Rpeak! ;-)
The whole thing becomes even more funny if you consider Altivec-Floatingpoint in the Equation! Even though it can handle 4 values in one cycle Altivec unfortunately can only do 32bit Floats (single precision), but let's just assume for fun we have/need 32bit Floats only: (4 FPops/Cycle + 8 FPops/Cycle (Altivec-FPMul-Add!)) * 2 GHz * 1100 Macs * 2 CPUs/Mac = 52.8 TFLOPS!!!!
Just for comparison: The so far unchallenged Earth Simulator at #1 "only" has 40.9 TFLOPS Rpeak!!
I'm not ******** kidding! ;-) Ofcouse it's only a theoretical value as you just can't feed 2 FPU-MulAdds and 1 Altivec-MulAdd (plus Data!) constantly every single cycle, but remember: All the Rpeak-Values are theoretical maxima! ;-) It's just the question how close you can get with the actual Rmax-Linpack-Resuit! But if you look at the Rpeak there's certainly alot of potential in this Cluster!
Now just let someonce like the NASAs Craig Hunter and his Jet3D or anyone able to write good vectorcode onto that thing and you can watch it SERIOUSLY fly! ;-)
Now, does anyone know how much of linpack needs 64bit? ;-) And one question remains: Does this thing belong to the Microprocessor-Class or the Vectorprocessor-Class (like the japanese ones or Crays!)?
P.S.: Just for information: A single G5 2GHz has 24 GFLOPS theoretical Peak! 8 GFLOPS from the FPUs and 16 GFLOPS from Altivec-FP! If you don't believe me, read here (http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/A2CE393ABF2CE99787256D21006AE8A2/$file/PPC970_MPF_Review.pdf), Page 4!
myrdred23
Sep 3, 2003, 04:24 PM
>>Is there any way to run OSX without the GUI?
Of course, Mac OS X can run headless just fine.
nagromme
Sep 3, 2003, 04:40 PM
If this is true I blame Moto, not Apple--but I sure wouldn't give my money to Apple for a G4 in October. That's too close to when I'd expect a G5!
But if you NEED a laptop before then, the current PowerBooks are still good... get the best deal you can and buy at the bottom-end: the 12" or an iBook. You'll have a nice extra-portable companion for your larger-size PB G5 later on.
A shame, but it sounds like there's nothing we or Apple can do... until the PBG5 is reality. Meanwhile, I'd rather own a 1 GHz PowerBook than any speed of Windows laptop!
In any case, I was always planning to wait for a G5, so this is GOOD news, if it makes Apple rush to make that happen sooner!
Apple will lose a LOT of laptop revenue from this, and some short-term market share too. But Apple can easily survive that.
ryan
Sep 3, 2003, 04:56 PM
Originally posted by shadowself
This is why it will take six days or more to install everything. It is not just setting up Macs on a simple gigabit Ethernet network.
Sorry, I don't believe this for a second. Not one second. The total up front cost hast to be at least $4 million (computers, additonal RAM, networking cards, routers, cabling, racks, cooling systems, etc., etc.) and probably much more. Running it for a cost of $240,000 a year or so is just out of the question. Five high end computer operators will cost much more than that. And that gives only ONE operator on a 24/7/365 basis. (It takes about 5 people to keep one slot fully staffed on a 24/7/365 basis.) Likely on a machine of this size the staff to maintain it will total more than 15 people (1 to 5 people on site at any given moment).
Probably a large portion of the hardware was sold to them at, or slightly above cost. And you're forgetting that they have an abundance of cheap/free labor, students.
Stoid:
What they are doing with all that CPU-Power?
This (http://news.com.com/2100-1008_3-5070403.html) Article explains what: "research on nanoscale electronics, chemistry, aerodynamics, molecular statics, computational acoustics and molecular modeling, among other tasks."
ryan
Sep 3, 2003, 05:10 PM
Originally posted by york2600
If you want to use Windows on a cluster your going to need Windows Server 2003 Enterprise of Datacenter Edition. That's going to run you $4,000 retail with 25 CALs. This may not necessarily be true. When it comes to clusters in a university setting, it has been reported that MS has provided the school with free licenses. I'm not defending MS but the amount that companies/universities pay for licensing in certain situations can be trivial to non-existent.
JoeRadar
Sep 3, 2003, 05:17 PM
Originally posted by beefstu01
Is there any way to run OSX without the GUI?
Think Xserve.
gwangung
Sep 3, 2003, 05:17 PM
Originally posted by ryan
This may not necessarily be true. When it comes to clusters in a university setting, it has been reported that MS has provided the school with free licenses. I'm not defending MS but the amount that companies/universities pay for licensing in certain situations can be trivial to non-existent.
This, of course, applies to Apple just as much. I'd suspect that Apple cut them some deals as well.;)
shadowself
Sep 3, 2003, 06:16 PM
Originally posted by ryan
Probably a large portion of the hardware was sold to them at, or slightly above cost. And you're forgetting that they have an abundance of cheap/free labor, students.
I've been in this situation before. The hardware is rarely done at cost. But even assuming they got the Macs at $2,500 each this is 2.75 million just for the Macs. (This is $2,314 off [almost half off] of the standard educational pricing for a 4 GB 2.0 GHz dual G5, and this is $2,849 off the list price [much more than half off].) Assuming similar deals on everything else (at least half off retail) will still bring the initial cost of an "all in" system (computers, networking cards, routers, cabling, racks, cooling system, high end UPS [you don't run these types of machines right off the public utilities], system administrative setup, etc., etc.) will certainly total over $4 million.
I was a grad student a long, long time ago ($9600 a year for 30+ [often 50+] hours a week as a research fellow; research assistants made even less back then). I don't know what grad students make these days, but I doubt it is much less than $18,000 a year. (You can go flip burgers for not much less than that.) If we assume an average of $18,000 a year over all the staff (under grads, grad students, professional staff [of which there must be 1 or 2 at least], administrative staff, etc. -- clearly some will be paid much more than $18k a year, e.g. the professional staff, some will be paid less) then the minimum body count I mentioned above for 24/7/365 support (15 bodies) will cost over $270,000 a year just for personnel. This does not take into account the cost of maintenance, electricity to run the systems and the cooling systems, etc. Even if I assume free hardware replacement parts (some of these 1,100 Macs WILL fail within 5 years) and free software and software upgrades for 5 years and I assume absolutely no cost indexing for inflation over the next 5 years, this is still over the $5.2 million budget.
Thus I still cannot believe a $5.2 million budget to include the initial system and operations for 5 years. The numbers just don't add up.
mkaake
Sep 3, 2003, 06:31 PM
what i thought was interesting was 'beta' version of x...
which to me answers all of the questions people have about x and clustering and cutting out the gui... i can't help but imagine this as a supercharged version of x that is specifically designed to work in a cluster environment...
matt
<edit>
well re-reading seems to make it sound like just the beta of panther, but i still can imagine a homegrown brew of x for clustering...
of course, that would be os x server.
hmm.
i'm gonna stop talking.
now.
macfreek57
Sep 3, 2003, 07:01 PM
i know that you can run os x in command-line mode
just change user settings to require the user name and password ( i can't remember the exact wording or which pref pane it's in: i'm not at my mac right now).
then log out (apple-shift-q)
at the login screen type >console in the user box and hit return
you will be taken to a black screen (what looks like Teminal maximized to your whole screen) and asked for a user name and password (you can't backspace if you mess up on the password, FYI).
to return to the login screen, type exit and wait a second
i do this all the time just for the fun of it ("woah! your mac looks like it's running DOS!").:D
theRebel
Sep 3, 2003, 08:52 PM
Originally posted by shadowself
Thus I still cannot believe a $5.2 million budget to include the initial system and operations for 5 years. The numbers just don't add up.
You seem to be making a wrong assumption here.
The $5.2 million does not include the university's pre-existing operating expenses. The $5.2 million is how much EXTRA that they will have to pay for this new system; the $5.2 million is the amount of additional expenses above what they would be paying if they did not have the new G5 cluster.
The university may have to hire additional employees for the new system, but nowhere near the numbers that you were suggesting. The univeristy already has an extensive staff of employees maintaining other computer systems 24/365 (btw, why do you include a 7 ? ) who can also help monitor and maintain the new cluster without any extra expense to the university. Existing university professors who will be using this cluster will also be providing some of the support at no extra cost to the university. Thus the only payroll expenses in the $5.2 million figure are related to additional new hire positions.
tychay
Sep 3, 2003, 11:13 PM
Originally posted by theRebel
You seem to be making a wrong assumption here.
The $5.2 million does not include the university's pre-existing operating expenses. The $5.2 million is how much EXTRA that they will have to pay for this new system; the $5.2 million is the amount of additional expenses above what they would be paying if they did not have the new G5 cluster.
This may be true, but I think that it is far more likely that there was an error in the reporting. It seems to me that even accounting for any discounts on the computers and hardware, the cluster itself would cost around $5 million minus labor. If we assume labor and maintenance is free (which it won't be, most clusters have a 24/7 devoted just to replacing downed nodes), we are still left with the additional operating expenses, power and cooling, which can get quite sizable (and that's ignoring the building space). Remember, Infiband cards and switches, though cheaper than other solutions, still will still easily run over $1 million to cluster. I don't expect Apple to be selling "at cost"--I've never heard of a single cluster being built that way.
My guess is that the $5.2 million is the cost of the hardware only or an expected "net" five-year cost. The actual five-year "all-in" cost would be higher, but that doesn't mean the university has to absorb all that. Heck, with the rates that NEC EarthSimulator charges, they could maybe even make a profit off of licensing time on the cluster.
Balooba
Sep 4, 2003, 10:43 AM
This must mean that Apple is not planning to release any g5 Xserves in a while.
Surely the cluster would have been better with xserves than with PowerMacs. It would be smaller, easier to manage, and not contain unnecessary graphics cards.
Surely, Apple would also have prefered them to use Xserves and wouold probably have told Va Tech if they had them coming soon.
No Xserves in a good while?!
shadowself
Sep 4, 2003, 11:45 AM
Originally posted by theRebel
You seem to be making a wrong assumption here.
The $5.2 million does not include the university's pre-existing operating expenses. The $5.2 million is how much EXTRA that they will have to pay for this new system; the $5.2 million is the amount of additional expenses above what they would be paying if they did not have the new G5 cluster.
I was only talking about EXTRA expenses -- not pre-existing operating expenses. I assumed they already had a building with the capability to install a 1,100 node cluster. I assumed they are not adding things like additional connectivity to the Internet or Internet2. I assumed they were not adding incidental things like additional security systems/security personnel. I assumed they are not adding things like administrative/accounting overhead to track and bill users for their usage of time on the cluster. If you get into the nitty-gritty details of a major installation like this there are literally thousands of little things that individually cost, on average, only a few hundred dollars a year but add up to several tens or hundreds of thousands of dollars in total. I have purposely not included any of this stuff in order to come up with what I consider the absolute lowest theoretical cost over five years. This cost is still well above $5.2 million.
Originally posted by theRebel
The university may have to hire additional employees for the new system, but nowhere near the numbers that you were suggesting.
Hiring 15 people is not a large number. In reality when accounting for undergrads and grad students the total number of paid people working on/supporting the system over the course of any given year will almost definitely exceed 15.
Originally posted by theRebel
The univeristy already has an extensive staff of employees maintaining other computer systems 24/365 (btw, why do you include a 7 ? ) who can also help monitor and maintain the new cluster without any extra expense to the university.
Maintaining other compute systems? You mean they already have tasks to do? Then how are they going to maintain/support a huge 1,000-1,100 node cluster? For something this size there are always (and I do mean always) dedicated support personnel. The university will have to hire people (whether they are grad students or computer proffessionals does not matter in this discussion -- whether they already work for the university or are brought in from Los Alamos does not matter for this discussion -- the fact is that the university will have to create new, dedicated labor slots for this machine).
Bye-the-bye, I include the "7" out of habit from the old days. That is the way I learned it way back when (24 hours a day, seven days a week, 365 days a year). These are the fixed numbers. Since the monthly number changes month to month it is not included.
Originally posted by theRebel
Existing university professors who will be using this cluster will also be providing some of the support at no extra cost to the university.
Users don't typically maintain on these large systems. The university professors will want to spend their time relative to the machine designing problems the computer can solve (developing theories, writing the algorithms, writing and debugging the software, etc.). Certainly a small subset of these professors will have some type of administrative responsibility over this system, but it will be rare to find the professors in there troubleshooting and repairing a failed node or interconnect.
Originally posted by theRebel
Thus the only payroll expenses in the $5.2 million figure are related to additional new hire positions.
That's all I was referring to. It does not matter if the new hire is a grad student or a secretary. The slot for this system is new to the university thus it is additional dollars. (Additionally, it does not matter if the secretary comes over from another department within the university. It is still a new slot for this system and most likely the university will have to replace him/her in the slot s/he moved from.)
Originally posted by Balooba
This must mean that Apple is not planning to release any g5 Xserves in a while.
Surely the cluster would have been better with xserves than with PowerMacs. It would be smaller, easier to manage, and not contain unnecessary graphics cards.
Surely, Apple would also have prefered them to use Xserves and wouold probably have told Va Tech if they had them coming soon.
No Xserves in a good while?!
My best guess is that there won't be any G5 XServes until January at the earliest. The university using stock PowerMac G5s supports this guess.
The university wanted to be listed in the next Top 500 listing. This means they have to have the thing up and running in October of this year. If no XServes until 2004 then they had to use stock PowerMacs. Apple is not going to push/hasten development of a new XServe for a single 1,100 order. It is just not cost effective, and hastening development of a new product often comes back and bites you in the form of unanticipated bugs.
AidenShaw
Sep 4, 2003, 01:34 PM
http://news.com.com/2100-1010_3-5071318.html
...The University of Utah will set up a supercomputer that's based on the Opteron processor from Advanced Micro Devices
...Last month, Los Alamos National Laboratory chose Linux Networx to build two large computing clusters that are based on Opteron. "Lightning," the larger of the two clusters, will contain 2,816 Opteron processors
shadowself
Sep 4, 2003, 01:50 PM
Originally posted by AidenShaw
[B]http://news.com.com/2100-1010_3-5071318.html
...The University of Utah will set up a supercomputer that's based on the Opteron processor from Advanced Micro Devices
/B]
Yes, and they are claiming the initial cost for 1,000 nodes is $2 million.
I was at the UofU back when they got a "super" a full blown IBM 3090-600VF -- and very loosely associated with the machine. It was a $22 million dollar set up ["list price" on everything] that they did a "special deal" with IBM for and got it for only $2 million -- or so they said. In reality, once all the costs were added up, the cost was almost 5 times that. Still a good deal, but no where near the cost quoted widely to the media at the time. [Additionally, it was set up in the business department -- not the computer science department or any other department of the college of engineering, nor was it set up on any department of the college of science. Go figure.]
Having that experience with the UofU many years ago is one of the many reasons why I don't believe the $5.2 million Virginian Tech five year number. Nor do I believe the UofU $2 million number.
There are always hidden costs they don't talk about that sometimes are as much as 4 to 5 times the quoted amount.
Doctor Q
Sep 4, 2003, 05:40 PM
Originally posted by AidenShaw
...The University of Utah will set up a supercomputer that's based on the Opteron processor from Advanced Micro Devices
...Last month, Los Alamos National Laboratory chose Linux Networx to build two large computing clusters that are based on Opteron. "Lightning," the larger of the two clusters, will contain 2,816 Opteron processors I like the idea that the supercomputer market has the competition necessary to keep the vendors on their toes and the prices "reasonable". The next time a university is ready to set up a supercomputer facility, they should be evaluating just how these facilities have fared.
cbfro
Sep 4, 2003, 08:13 PM
Tonight was the second night of orientation sessions for volunteers helping to set up the cluster. VT is actually recieving 1105 computers, 5 of them being single processor machines for administration purposes I would assume.
For those of you wondering why it is going to take so long to set up the cluster, here is the answer: First off, they are only unpacking 15 computers at a time, checking for DOAs (hopefully there will be none) installing the InfiniBand cards, and then moving them to the rack room to be placed in the rack. Secondly, the only cables run so far are the power cords. That leaves two InfiniBand Fibers and Gigabit Ethernet still to be put in place. AND the reall kicker is that the computers aren't coming in all at once, instead they will arrive on three different days.
cbfro
Sep 4, 2003, 08:19 PM
Predictions on resales to students seem to be correct. All boxes, keyboards, and mice are being stored for the update to "a new form factor." I would say that is pretty good hint towards an upgrade path to include Xserve G5s or whatever may come after that.
The building these things are going in already has security for it, so no exense there. Plus the people in charge of running these things are current employees of VT. At most they may hire one or two more people to assist, but i see this thing being staffed mostly by grad students.
And a word on what they will be running on these machines. It will indeed be OS X with custom Fortran and C programs in addition to a couple commercial packages.
Doctor Q
Sep 4, 2003, 08:42 PM
Originally posted by cbfro
The building these things are going in already has security...Note to self: Cancel plan to wander in and walk off with a couple of Power Macs under each arm.And a word on what they will be running on these machines. It will indeed be OS X with custom Fortran and C programs in addition to a couple commercial packages.Have the specific projects been announced, i.e., the purposes of these programs?
tychay
Sep 4, 2003, 09:44 PM
Originally posted by shadowself
Yes, and they are claiming the initial cost for 1,000 nodes is $2 million.
My guess is that is the node cost and doesn't cover any of the switching hardware, let alone the labor, etc. I found the article had an interesting reference to the Angstrom superblades (http://www.angstrom.com/products/form_hpc_superblade.htm) which I never heard of before.
Their website is pretty scarce on information, but the density is achieved by stacking the blade centers back to back (apparently the cooling is bidirectional). That's because the rack density is not very good (9U unit holds 13 blades). The blades can carry 2 Opterons with 4 DIMM bays (there are no specs on the actual blades, no specs on the HD or integrated video, etc).
I was curious if anybody knows anything about these things since their website is highly uninformative (even the prices are missing). For instance, are the power supplies hotswappable/redundant? Are the fans hotswappable/redundant? How can you hotswap when the units are back to back? Do they have a fiberchannel interface? Can you do anything other networking than serial/Gig-E? etc. It seems like not because they cut the length of the rack in half.
All in all, Utah is basically ordering a 1000 node BladeCenter from a company with no previous blade experience (they're ultra dense systems were simply two dual Athlon mobos crammed into a 1U case). I thought that nobody would take on a strategy as risky as VATech (building a cluster from an unreleased computer), but I was wrong.
If they were both finished today, I think the Utah system would have a higher Rpeak (I can't be sure since I'm too lazy to look up everything). Rmax would be lower due to bandwidth limitations of the blades. Even the former would be reversed if someone got a version of double precision Linpack (the test for the Top100) to work with Altivec optimizations (an 4x increase)--there is some notes about Altivec optimized single precision, but none about double. Is that a limitation of the G4 Altivec and does this limitation carry over to the G5 VMX? Anyone?
Take care,
terry
Sun Baked
Sep 4, 2003, 09:49 PM
Originally posted by OoklaTheMok on Ars in Notes from Virginia Tech presentation on G5 cluster (http://arstechnica.infopop.net/OpenTopic/page?a=tpc&s=50009562&f=8300945231&m=2530985385&r=2530985385#2530985385):
A friendly little bird who was there gave this to me. Posted with all comments intact. She didn't understand everything they were talking about; maybe some of you will.
-----
The following are notes taken from a powerpoint presentation given at 11AM
9/4/2003
more info can be found at
http://www.computing.vt.edu/research_computing/terascale
"Terascale Computing Facility"
Opening remarks:
-An advisory committee has been appointed for governance of use. [Yay.]
Slide One
Computational Science and Engineering Institute[?]
Goals:
- to build a world class facility
- to provide high performance network to tie in with computational grids
-connect supercomputers, visualizations, and data storage
Slide Two
Goals and Scope
1. support research in computational science and engineering
2. dual usage: production & experimental (apps)
3. create beneficial collaboration [scribble scribble, I need to write things I can read]
Slide Three
TCF
- based on 64 bit architechture
- employs high bandwidth low latency communications fabric
- operational for production [apps?] in Fall 2003; fully operational by the end of the year
Slide Four
Choosing the Right Architechture
- cost vs. performance (purely)
- total cost $5.2 million includes system itself, memory, storage, and communication fabrics
- one of the cheapest systems of its kind
Slide Five
Architectural Options [or something like that]
- Dell - too expensive [one of the reasons for the project being so "hush hush" was that dell was exploring pricing options during bidding]
- Sun (sparc) - required too many processors, also too expensive
- IBM/AMD (opteron) - required twice the number of processors and was twice the price in the desired configuration; had no chassis available
- HP (itanium) - ditto
- Apple (IBM PPC970) - system available with chassis for lowest price
Slide Six
Nodes
- dual PPC970 2GHz
- Each node has:
- 4 GB RAM
- 160 GB serial ATA
- 176 TB total secondary storage
- 4 head nodes
- 1 management node
- most powerful "homebuilt" supercomputer in the world
Slide Seven
Reliability
- commodity clusters have issues due to the large number of units
- VT developed transparent fault tolerance system called "Deja Vu"
- collaborated with PSC
- can recover from just about every failure, i.e. someone hits the wrong switch, OS crashes, things fail in general, power loss, etc
- This system has been ported to the G5 and will be deployed in the TCF
Slide Eight
Primary C [omputer??] Architechture
- working with Mellanox for infiniband solutions
- the system is [obviously] based on infiniband technology
- full switch network 20 Gbps, full duplex
- 24 96 port switches in "fat-tree topology"
Slide Nine
Secondary Com[????munications? ...puter?]
- Gigabit fast ethernet management backplane
Slide Ten
National Lambda Rail (nationwide optical network)
- all networking equipment [at least for this locale] is CISCO
- the following organizations are involved with NLR:
-CENIC
-CISCO
-Duke
-Florida Consortium
-Georgia Tech
-Internet2
-MATP
-PNWGP [pacific northwest group]
-Texas [I imagine a university, not the whole state. "Yes, Texas backs National Lambda Rail, yessir."]
- additional player: PSC [yeah I don't know either]
- VT leads Washington DC point of presence
- DC node goes active in the first half of 2004
Slide Eleven [maybe]
Software
- Mac OSX
- Why not linux? Not enough support.
- Mellanox does Inifiniband drivers and HCA
- MPI (parallel communications libraries)
- Argonne National labs to get MPI-2 for the system
- C, C++ compilers - IBM xlc and gcc 3.3
- Fortran 95/90/77 Compilers - IBM xlf and NAGWare
Slide Twelve [I should give up soon]
Sustainability Model
(organizations that could make use of the facility or have already expressed interest)
- Federal organizations
-NSF, CyberinfrastructureProgram
-NIH, DOE, DARPA, DoD, AFOSR[??], ONR [?????]
- Industry (the system can attract industrial interest)
- External Research Partners
- National Labs, Supercomputer Centers, NASA, NIA
Slide Thirteen [I never learn]
Access
- internal access not based soley on research funding contributed
- priorities might be established based on contribution at a later time
- provide easy access for investigators [I missed the end of this line]
- external access determined on a cost recovery basis
Slide Fourteen
Future
- Computational Science and Engineering is a long-term project
- Current facility will be followed with a second in 2006
Slide Fifteen
Timeline
- Oct. 1st - preliminary operations
- Oct. 1st - Mid Nov. - performance optimization and benchmarking
- Mid Nov. - available for initial apps ("hero-users" [heh, i.e. the poor suckers who test out the initiall config])
- available to any user with operating MPI coverage [huh?]
- Jan 2004 - fully operational
PART II of presentation
-insert lots of information here
VT Op. Center will be staffed 24-7
Facility
- 3 MW power, double redundant with backups - UPS and diesel
- 1.5 MW reserved for the TCF
- 2+ million BTUs of cooling capacity using Liebert's extreme density cooling (rack mounted cooling via liquid refrigerant)
-traditional methods [fans] would have produced windspeeds of 60+ MPH
--insert photos of cool stuff here- facilities, a G5, cooling rack set up-
Racks were custom designed for this system
Usage
- dual usage - production and experimental
- experimental operations will not interfere with production use
CSE Research Avenues
- nanoelectronics
- quantum chemistry
- computational chemistry/biochemistry
- fluid dynamics
etc, etc, etc
PART III
Services [Offered at the Facility]
- code development assistance
- general support
- grant writing support
[missed an item here]
Code Development Assistance
- HCSE
- Code kitchens w/apple and others
- FDI - expanding research tracks
Housed in AISB Machine Room
- Sysadmins will be available 24/7/365
- Tiered support
- Grant writing support
- Marketing and Business Rev[something.. enue?]
Official presentation ended here, begin Q and A!
How long would it take to parallelize the code?
- Highly application dependent
- can build in an afternoon or take weeks/months
- duties of the management node can be covered by any of the four head nodes
- The system is a distributed memory machine, not a shared memory machine
If 2 battling applications - no interference in terms of communication
What kind of physical security for the facility?
-access to the building via keycard
-access to the machine room determined via biometrics
Online material concerning the project? Not at this time; probably up later today or late tonight
Does it render? Yes, incidentally, it does. The units came with high end graphics cards
- Plans to connect with The CAVE? Maybe.
Source of funding for the project? From different colleges within the university.
Are there limitations on the level of complexity that the system can be used for? Not yet fully considered, but no web servers and no hosting an irc network
Why so secret? Project started back in February; secret with Dell because of the pricing issues; dealt with vendors individually because bidding wars do not drive the prices down in this case
Deja vu does not do load balancing.
tychay
Sep 4, 2003, 10:43 PM
Originally posted by Sun Baked
The following are notes taken from a powerpoint presentation given at 11AM
9/4/2003
Thanks!
- total cost $5.2 million includes system itself, memory, storage, and communication fabrics
I guess this answers that debate. It's $5.2 million for the systems itself as some of us guessed. The 5-year cost must have been a misquote.
- Each node has:
- commodity clusters have issues due to the large number of units
- VT developed transparent fault tolerance system called "Deja Vu"
- collaborated with PSC
- can recover from just about every failure, i.e. someone hits the wrong switch, OS crashes, things fail in general, power loss, etc
...
- additional player: PSC [yeah I don't know either]
I assume this answers my RAM reliability problem. BTW, I would think PSC means Pittsburgh Supercomputer Center (http://www.psc.edu/), which makes a lot of sense. They were on par with SDSC (where the director of VaTech's project is from).
- Mac OSX
- Why not linux? Not enough support.
- MPI (parallel communications libraries)
- Argonne National labs to get MPI-2 for the system
That answers all the people who were so sure it had to be Black Lab Linux (http://www.terrasoftsolutions.com/products/blacklab/). It turned out, as expected, it isn't an issue of kernel support of clustering since they're running an application-layer massively parallel system (MPI).
- C, C++ compilers - IBM xlc and gcc 3.3
- Fortran 95/90/77 Compilers - IBM xlf and NAGWare
IBM xlc and xlf for the G5 were just released in Beta. Coincidence?
- Industry (the system can attract industrial interest)
- external access determined on a cost recovery basis
Should help with some of the costs...
- Oct. 1st - preliminary operations
- Oct. 1st - Mid Nov. - performance optimization and benchmarking
- Mid Nov. - available for initial apps ("hero-users" [heh, i.e. the poor suckers who test out the initiall config])
- Jan 2004 - fully operational
Ouch, I was wrong about this. This still seems very aggressive to me but it does explain why IBM entered the bid with an Opteron-based system (IBM G5 blades/servers won't be available until 1Q 2004) as well as answers the "why not an G5 XServe" rants.
- Sysadmins will be available 24/7/365
This should make a certain someone happy. Note the reference to 24/7/365! ;)
Does it render? Yes, incidentally, it does. The units came with high end graphics cards.
Can anyone explain what the heck this means?
Doctor Q
Sep 4, 2003, 11:28 PM
Current facility will be followed with a second in 2006Too soon to start speculating about what a state of the art supercomputer would be like in 3 years? They might be buying 10^6 Power Mac G8s instead of 10^3 Power Mac G5s.
szark
Sep 5, 2003, 12:03 AM
Originally posted by Sun Baked
- 176 TB total secondary storage
Only 176 TB? ;)
Originally posted by Sun Baked
Does it render? Yes, incidentally, it does. The units came with high end graphics cards
How long would it take to render Finding Nemo on this system...
Originally posted by Sun Baked
- Plans to connect with The CAVE? Maybe.
Ooooooooooooooo. G5 driven CAVE system. Nice. :D
Originally posted by tychay
Even the former would be reversed if someone got a version of double precision Linpack (the test for the Top100) to work with Altivec optimizations (an 4x increase)--there is some notes about Altivec optimized single precision, but none about double. Is that a limitation of the G4 Altivec and does this limitation carry over to the G5 VMX? Anyone?
As far as Floating Point goes, Altivec in its current form can only work on 4 32bit-Floats/single precision (with Integer there's also the option of 8*16bit and 16*8bit! The Integer-part of Floats is usually 15bit from what i know, hence 16bit or 8bit floats wouldn't really make sense! ;-), also on the G5 since the G5s "VMX" is more or less just the Altivec-Unit they made back in the day together with Moto in Somerset for the 7400-G4, and they just seem to have dug out their old masks and layouts! ;-)
They could add the option of 2*64bit in a later Altivec-Revision, but that's not that much SIMD (Single Instruction MULTIPLE Data) anymore, is it? ;-) Would still be nice, as you could crunch twice the 64bit-Numbers compared to today (4 instead of 2 at a time!)!
And an extension from 128bit to 256bit would be quite an undertaking! Altivec-Units already generally take up quite a bit of Die-Space!
There have been rumours about an extended Version of Altivec for some time, but nothing definate has emerged! It seems though that since the P970 was made IBM does have an interest in Altivec and developing it finally! I wouldn't count on Motorola anymore to do the job!
Point is: Quite a few compute-intensive jobs can work with 32bit-Floats, just look at the NASA's Jet3D or Genentech's BLAST gene sequencing!
And no, SSE2 is *not* any better "since it can do 64bit double precision", because in a P4 you can either use SSE2 *or* the really weak single-precision 32bit-FPU, not both at the same time, since it's basically the same Unit! ;-) With the Athlon and Opteron, they just "map" SSE(2) commands onto the regular FPUs internally, hence they just fare "okay" in that!
With the G4 and G5, the 64bit-FPU(s) and Altivec *can* run in parallel if properly scheduled, and i sure hope IBMs XLC does a good job in scheduling appropriately! ;-)
What's more is that the G4s and G5s FPUs are naturally Double Precision 64bit (and have been since 60x-times!), a P4 and Athlon/Opteron either have to emulate it using the 32bit-FPU working on 2 32bit-Floats (huge performance-hit!) or have to use SSE2 for it!
So the G5s FPUs on their own already are very powerful and competitive to Systems ALREADY using the SSE2-SIMD-Extension, and Altivec is for the moment just a VERY nice bonus, that can later on be pretty nicely used for certain Tasks! :-)
This is for the Virginia Tech People on here: GET CRAIG HUNTER FROM THE NASA IF YOU WANT TO DO FLUID DYNAMICS AND SEE THAT THING FLY USING ALTIVEC! (http://members.cox.net/craig.hunter/g5/) ;-)
P.S.: Altivec seems to work very well though for quad or octa-precision! Check out the PDF (http://developer.apple.com/hardware/ve/pdf/G4multiprecision.pdf) and Sample Code (http://developer.apple.com/samplecode/Sample_Code/Devices_and_Hardware/Velocity_Engine/VelEng_Multiprecision.htm) on Apple's Website!
vBulletin® v3.8.6, Copyright ©2000-2012, Jelsoft Enterprises Ltd.