View Full Version : Virginia Tech Supercomputer Cluster Info Session
MacRumors
Sep 5, 2003, 12:44 AM
On September 4th, 2003, Virginia Tech held an informational session for their upcoming Supercomputer Cluster. The new Cluster has received a lot of attention as it is expected to be one of the top computer clusters in the world, and utilize 1100 of Apple's new Dual 2.0GHz PowerMac G5s.
The informational session provided confirmation of some of the available information as well as some interesting details of the planning stage. As previously reported, the total cost of the Supercomputer Cluster comes to $5.2 million -- which includes systems, memory storage and "communication fabrics".
Even at $5.2 million, the overall cost of the system is said to be "one of the cheapest systems of its kind". In determining which architecture to use, many vendors were considered beyond Apple -- including Dell, Sun, IBM and HP. The final decision was made on a pure Cost vs Performance basis -- with Apple's solution providing the best overall price.
The PowerMac G5 systems will be running Mac OS X, and will also utilize a custom "fault tolerance" software system called Deja Vù. This fault tolerance system will allow the cluster to withstand "just about every failure".
The cluster is expected to begin operations on October 1, 2003, with performance tweaking through Mid November. At that time, it will be open for initial applications, with a fully operational cluster expected by January 2004.
The entire transcript of notes is provided here (http://www.chaosmint.com/mac/vt-supercomputer/). Virginia Tech student, Myuuchan, took these detailed notes from the session. These notes were submitted to MacRumors by Cless.
GrannySmith_G5
Sep 5, 2003, 12:48 AM
very nice.
Booga
Sep 5, 2003, 12:55 AM
The only acronym for PSC I know, and one that fits with supercomputing, is the Pittsburgh Supercomputing Center, in Pittsburgh, PA. It's one of the original NSF supercomputer centers, operated from Carnegie Mellon University, and one of the reasons the internet's predecessor was created. I believe PSC upgraded through the years from a Cray X-MP to a Y-MP to a T3D, and who knows what they have now.
It's kind of funny and makes a certain amount of sense they'd be involved. CMU must have been one of the biggest customers for NeXT computers back in the day, and has always maintained decent Mac clusters as well as their UNIX and Windows offerings for their students.
woodsey
Sep 5, 2003, 01:02 AM
Just a quick question...
When Apple releases dual 3Ghz G5s next August, can they simply be added to the cluster to increase the processing power, or do all the machines need to be the same?
reyesmac
Sep 5, 2003, 01:21 AM
Too bad they can't run Virtual PC on that thing.
I wonder if they run the Mac OS on those things or some weird supercomputer OS I never heard of.
MacFan26
Sep 5, 2003, 01:24 AM
Originally posted by reyesmac
I wonder if they run the Mac OS on those things or some weird supercomputer OS I never heard of.
The article says that they will run Mac OS X.
rotorblade
Sep 5, 2003, 01:32 AM
At least we won't have to hear any crap spewing from that blow hole under the nose of Michael Dell about this one.
IIRC from the cnet article its a beta version of the OS, which seems unlikely, so I expect its panther.
Potus
Sep 5, 2003, 01:35 AM
Unbelievably and totally cool!
Originally posted by mvc
IIRC from the cnet article its a beta version of the OS, which seems unlikely, so I expect its panther.
Panther is a Beta version of the OS. :)
arn
Yup, but the part I found unlikely was that they would run with a beta at all. ;)
Not explaining myself properly - been a long day
Myuuchan
Sep 5, 2003, 01:51 AM
Originally posted by Booga
The only acronym for PSC I know, and one that fits with supercomputing, is the Pittsburgh Supercomputing Center, in Pittsburgh, PA. It's one of the original NSF supercomputer centers, operated from Carnegie Mellon University, and one of the reasons the internet's predecessor was created. I believe PSC upgraded through the years from a Cray X-MP to a Y-MP to a T3D, and who knows what they have now.
It's kind of funny and makes a certain amount of sense they'd be involved. CMU must have been one of the biggest customers for NeXT computers back in the day, and has always maintained decent Mac clusters as well as their UNIX and Windows offerings for their students.
Yes, that's exactly what it was. I'd forgotten entirely.
Sorry about the confusion!
ZildjianKX
Sep 5, 2003, 01:52 AM
Can someone tell me how exactly 1,100 Dual G5s being shipped to them was suppose to slow down the shipping for Dual G5s for everyone else? There were supposively over 100,000 G5s ordered, and supposively most of those were duals...
Myuuchan
Sep 5, 2003, 01:56 AM
Originally posted by woodsey
Just a quick question...
When Apple releases dual 3Ghz G5s next August, can they simply be added to the cluster to increase the processing power, or do all the machines need to be the same?
At the present moment, they're trying to establish functionality with homogeneous machines. However, it was mentioned that they may consider adding machines of different architechture later on in the process - a step up in processing speed might occur at that time, though they made no mention of that specific detail.
Myuuchan,
Thank you very much for taking and providing these notes... as you can see, people are closely interested in this topic. :)
arn
legion
Sep 5, 2003, 02:53 AM
"The final decision was made on a pure Cost vs Performance basis"
Seems like a little backtracking (and spin)... IIRC, the big deciding factor was if it was able to be delivered on a set schedule that VT needed. Apple promised to make sure that it would be on time; others couldn't commit. Well of couse school admins have to justify their decision making especially at a public school during budget cuts.
As for Dell and supercomputing: #25 on the list, 2 Tflops, 600 Xeon 2.4Ghz @ SUNY.
Nermal
Sep 5, 2003, 04:11 AM
Hmm, I've got an icon called Déjà Vu in my System Prefs. It's part of Toast 6. With the amount of lawsuits going around these days, I sense another one coming along :(
admford
Sep 5, 2003, 05:07 AM
Originally posted by Myuuchan
At the present moment, they're trying to establish functionality with homogeneous machines. However, it was mentioned that they may consider adding machines of different architechture later on in the process - a step up in processing speed might occur at that time, though they made no mention of that specific detail.
Actually keeping all of the machines the same on a cluster helps out with programming. Since all machines are equally fast, you don't have to bother too much with the timing of data transfer to the server. When you have a cluster of different machines, programming becomes more problematic, and Virginia tech want to run atleast a basic benchmark to see the effective speed of the cluster.
cb911
Sep 5, 2003, 05:30 AM
that's goin to be one kickin computer!! :D
centauratlas
Sep 5, 2003, 06:34 AM
It is too bad the quad processor machines (or xserves) were available for them to order. (Or for that matter 8s).
Regarding using the same processor in each machine, with this type of archictecture it shouldn't matter a whole lot if they are the same speed or not. This is a MIMD machine not a SIMD machine so the odds of it mattering are low.
MrMacMan
Sep 5, 2003, 07:00 AM
internal access not based soley on research funding contributed
Awwwww!
:(
CSE Research Avenues
nanoelectronics
quantum chemistry
computational chemistry/biochemistry
fluid dynamics
etc, etc, etc
Ah, another computer with like 10 research applications at the same time.
nilspace
Sep 5, 2003, 07:50 AM
They are not running OSX (any version). I've worked with the Admin who is running/working on the project. They have done a special build of Darwin (the open-source kernel under Aqua that runs OSX). They are hoping this does the trick. However, if it doesn't, they are already planning on falling back on Linux to run the cluster.
whooleytoo
Sep 5, 2003, 08:03 AM
Originally posted by Nermal
With the amount of lawsuits going around these days, I sense another one coming along :(
You can? Isn't that... Deja Vu? :)
Myuuchan
Sep 5, 2003, 08:57 AM
Originally posted by nilspace
They are not running OSX (any version). I've worked with the Admin who is running/working on the project. They have done a special build of Darwin (the open-source kernel under Aqua that runs OSX). They are hoping this does the trick. However, if it doesn't, they are already planning on falling back on Linux to run the cluster.
Now that's interesting. Because they mentioned at least twice that they'd be running under OSX. How odd.
Originally posted by ZildjianKX
Can someone tell me how exactly 1,100 Dual G5s being shipped to them was suppose to slow down the shipping for Dual G5s for everyone else? There were supposively over 100,000 G5s ordered, and supposively most of those were duals...
More than half, from what I have heard. At best, this is just a nice excuse. It is good PR for Apple, but it is a lame way of trying to appease all those customers that ordered these machines when they were announced back in June. You would think Apple wants to keep the Mac faithful that buy the newest stuff happy...
eddyg
Sep 5, 2003, 09:44 AM
Originally posted by nilspace
They are not running OSX (any version). I've worked with the Admin who is running/working on the project. They have done a special build of Darwin (the open-source kernel under Aqua that runs OSX). They are hoping this does the trick. However, if it doesn't, they are already planning on falling back on Linux to run the cluster.
And how is "a special build of Darwin" not OS X. OS X is the entire package, including kernel, not just the GUI.
So if they are running Darwin, then they are running OS X.
Cheers, Edward.
ryan
Sep 5, 2003, 09:53 AM
Originally posted by admford
Actually keeping all of the machines the same on a cluster helps out with programming. Since all machines are equally fast, you don't have to bother too much with the timing of data transfer to the server. When you have a cluster of different machines, programming becomes more problematic, and Virginia tech want to run atleast a basic benchmark to see the effective speed of the cluster. I'm sorry but this is simply false. There is no real timing issues that have to be delt with in a situation like this. As each node completes its work unit it contacts the main server and basically says "Here are my resutls for work unit X, let me know when when you have another [work unit] ready for me." The server receives the completed work unit, places it in the completed/verify bin/queue and then hands out the next work unit to the waiting node.
whooleytoo
Sep 5, 2003, 10:20 AM
So many things in the article are striking:
- Apple the cheapest option (is this a first? ;-)
- Apple (using IBM chips) competing with IBM (using AMD's!)
- The G5 option requiring fewer processors than the Itanium or Opteron options!
rundevilrun
Sep 5, 2003, 10:43 AM
Originally posted by whooleytoo
You can? Isn't that... Deja Vu? :)
Deja Vu - Thousands of beautiful computers and three ugly ones... :D
For those who don't get it: Deja Vu is also the name a of chain of strip clubs whose slogan is "thousands of beautiful girls and three ugly ones"
Kurt
Sep 5, 2003, 10:43 AM
I wonder how long before some one else is building these multiprocessor computers using IBM blades with G5s.
Wonder Boy
Sep 5, 2003, 10:50 AM
I had to read that headline over 3 or 4 times. Apple is the cheapest option? Where am I? What year is this?
network23
Sep 5, 2003, 10:53 AM
"1,100 Dual G5's"
Man, what some people will do to make OS X run as fast as OS 9!
Sun Baked
Sep 5, 2003, 11:11 AM
Originally posted by eddyg
And how is "a special build of Darwin" not OS X. OS X is the entire package, including kernel, not just the GUI.
So if they are running Darwin, then they are running OS X.
Cheers, Edward. Apple markets Darwin and OS X as two different products and EULAs. While OS X has a copy of Darwin at it's core, Darwin is not equal to OS X.
While it does get people mixed up, it drives some people as crazy as the VERY old -- if Windows 1.0 has PC DOS at it's core, then if I'm running DOS I'm running Windows.
AidenShaw
Sep 5, 2003, 11:14 AM
Originally posted by Kurt
I wonder how long before some one else is building these multiprocessor computers using IBM blades with G5s.
You won't see that - IBM will call them PPC970 !! ;)
Actually, the IBM BladeCenter would be a much cleaner solution.
You'd get 168 PPC970 CPUs in 6 square feet (one 19" rack), so 14 racks would hold more CPUs than the 1100 PowerMacs. The entire supercomputer could fit in a 10' by 10' cubicle!
You'd also have redundant power, remote management hardware, advanced ECC memory,....
The only big disadvantage would be that the blades don't have PCI slots, so no InfiniBand unless IBM makes an InfiniBand option. Each blade has dual GigE plus dual 2Gbps Fibre Channel.
aethier
Sep 5, 2003, 11:30 AM
Originally posted by network23
"1,100 Dual G5's"
Man, what some people will do to make OS X run as fast as OS 9!
lol
anyways that is going to be one major kick ass computer.
suddenly it doesnt seem as impressive to brag that my school has 6 emacs... :(
mind you that my school is only a highschool
aethier
JoeRadar
Sep 5, 2003, 11:39 AM
Originally posted by eddyg
So if they are running Darwin, then they are running OS X.
I wonder what the full OS X has that Darwin doesn't that might be wanted in a compute farm?
I have always pictured compute/render farms as very simple OSes (e.g., stripped linux, no GUI, ...), perhaps with an efficient network stack. Load some custom code (e.g., folding code) on each box, send it some data, and give the code as much of the CPU as possible.
A full Windows XP or even OS X might be overkill for such a box, and may slow it down. Anyone have any direct experience?
weez75
Sep 5, 2003, 11:39 AM
Originally posted by eddyg
And how is "a special build of Darwin" not OS X. OS X is the entire package, including kernel, not just the GUI.
So if they are running Darwin, then they are running OS X.
Cheers, Edward.
Darwin is not OSX. Darwin is an open-source operating system that is the foundation for OSX and its based upon BSD.
From Apple's Darwing page:
Q. How does Darwin relate to Mac OS X?
A. Darwin is the core of Mac OS X. All software built for Darwin should be able to run unmodified on Mac OS X. However, because Darwin by itself does not encompass all of the features of Mac OS X, software that depends on higher-level features of Mac OS X (such as the Cocoa and Carbon toolkits) will not run on a stand-alone Darwin system.
Check it out at http://developer.apple.com/darwin/projects/darwin/faq.html
So if the operating system for the VT supercomputer is Darwin then it is not necessarily OSX. If the cluster is running an early version of Panther as reported in several places then it is running OSX with Darwin at the core.
Raid
Sep 5, 2003, 11:41 AM
Originally posted by rundevilrun
Deja Vu - Thousands of beautiful computers and three ugly ones... :D
For those who don't get it: Deja Vu is also the name a of chain of strip clubs whose slogan is "thousands of beautiful girls and three ugly ones"
Hey rundevilrun you a local (in this case Toronto) ? :)
More on topic... I wonder if the computer array program pooch would be able to work with the new G5's and some older computers. I know it worked between G4's and G3's. It'd be interesting to see if three generations of Apple computers could be made into a functioning array. If it worked I'd keep my old computer and just hook it up to the array as a backgroup proccessor!
Raid
Flowbee
Sep 5, 2003, 11:44 AM
Originally posted by Wonder Boy
I had to read that headline over 3 or 4 times. Apple is the cheapest option? Where am I? What year is this?
Yeah, forget about the speed. This has to be the best news Apple's P.R. department has had in a long time. :)
JoeRadar
Sep 5, 2003, 11:46 AM
Originally posted by whooleytoo
- Apple (using IBM chips) competing with IBM (using AMD's!)
IBM has a contract to build a supercomputer called ASCI Purple with 12,544 POWER5 chips (the, presumably, basis for Apple's next G5 (G6?) chip).
(CNET story (http://news.com.com/2100-1001-984808.html?tag=fd_top), IBM press (http://www-1.ibm.com/servers/eserver/pseries/news/pressreleases/2002/nov/asci_purple.html))
Sun Baked
Sep 5, 2003, 11:47 AM
Originally posted by weez75
So if the operating system for the VT supercomputer is Darwin then it is not necessarily OSX. If the cluster is running an early version of Panther as reported in several places then it is running OSX with Darwin at the core. They could be running the Darwin 7 Preview Seed that corresponds to select portions of the 2003 WWDC Panther seed.
There may have been some updates in the seed that they needed.
Tim Flynn
Sep 5, 2003, 12:05 PM
Originally posted by Nermal
Hmm, I've got an icon called Déjà Vu in my System Prefs. It's part of Toast 6. With the amount of lawsuits going around these days, I sense another one coming along :(
Any comments on Toast 6?
I want to use it for two things, making SVCDs from iMovie (from a miniDV) and recording audio from the optical inputs (when I get my G5)
dongmin
Sep 5, 2003, 12:12 PM
Originally posted by iPC
More than half, from what I have heard. At best, this is just a nice excuse. It is good PR for Apple, but it is a lame way of trying to appease all those customers that ordered these machines when they were announced back in June. You would think Apple wants to keep the Mac faithful that buy the newest stuff happy...
Umm, it's not an excuse. Apple never said that your order's been delayed because of the VT cluster. All they said was that they're filling education orders first to time with the back to school season. (Of course it's not clear at all what they mean by edu orders.) The VT thing is just something sexy for us rumor folks to talk about. I don't think Apple itself has said a peep about the cluster yet.
szark
Sep 5, 2003, 12:29 PM
Originally posted by dongmin
The VT thing is just something sexy for us rumor folks to talk about. I don't think Apple itself has said a peep about the cluster yet.
Yes, they have. They linked to the CNET article on their "hot news" page, which is essentially a confirmation from Apple.
Although they certainly haven't said the cluster is the reason for the shipping delay.
jettredmont
Sep 5, 2003, 12:34 PM
Originally posted by woodsey
Just a quick question...
When Apple releases dual 3Ghz G5s next August, can they simply be added to the cluster to increase the processing power, or do all the machines need to be the same?
Outsider guess:
Most likely they can just be added, but will not be fully utilized in a preferential manner. They'll work 50% faster than the other computers, and so will be pulling tasks 50% faster than the other computers around them, but won't end up getting "larger" tasks to keep the setup/execute ratio optimal unless the app in question "noticed" their relative speed.
This is based on a guess about how an individual application would structure itself, so might not apply in all cases.
IMHO, how well a faster machine would fit in would be a product of application design over system design.
jadedchameleon
Sep 5, 2003, 12:36 PM
Originally posted by ryan
I'm sorry but this is simply false. There is no real timing issues that have to be delt with in a situation like this. As each node completes its work unit it contacts the main server and basically says "Here are my resutls for work unit X, let me know when when you have another [work unit] ready for me." The server receives the completed work unit, places it in the completed/verify bin/queue and then hands out the next work unit to the waiting node.
Actually the original poster could be correct. I would guess some variant of MPI will be used in this, and MPI does NOT specify that you do workunit queuing. Simpler MPI programs simply divide the work into equal parts, and ship it off. Many problems depend on previous data, so the original poster was correct that timing issues (mostly inefficiency--the faster nodes are only utilized as much as the slower ones) can show up.
Considering the fact that many of these programs (this comes from working with a Top500--was around #100--cluster at a major university) are written by non-CS types--such as chemists, mechanical engineers and such, you can expect to see less-than-optimal programming.
jettredmont
Sep 5, 2003, 12:36 PM
Originally posted by mvc
Yup, but the part I found unlikely was that they would run with a beta at all. ;)
Not explaining myself properly - been a long day
True enough. Although, by January 2004 (operational date of the system) I should hope Panther is no longer Beta.
nilspace
Sep 5, 2003, 12:41 PM
Originally posted by Sun Baked
They could be running the Darwin 7 Preview Seed that corresponds to select portions of the 2003 WWDC Panther seed.
There may have been some updates in the seed that they needed.
I don't believe so. However, since Darwin is open-source, the Admin was able to tweak it and do a rebuild himself.
I'll try and find out more info and post it here.
ffakr
Sep 5, 2003, 12:46 PM
Originally posted by nilspace
They are not running OSX (any version). I've worked with the Admin who is running/working on the project. They have done a special build of Darwin (the open-source kernel under Aqua that runs OSX). They are hoping this does the trick. However, if it doesn't, they are already planning on falling back on Linux to run the cluster.
The Admin should explain to you that Darwin is not an open-source Kernel. Mach is the kernel in Darwin, which is an Open source OS, which is based off of FreeBSD, which makes up the core of Mac OS X.
Essentially (over simplified):
FreeBSD + Mach + Apple's Mods = Darwin
Darwin + Quartz + Aqua + OpenStep = OS X
jettredmont
Sep 5, 2003, 12:48 PM
Originally posted by ZildjianKX
Can someone tell me how exactly 1,100 Dual G5s being shipped to them was suppose to slow down the shipping for Dual G5s for everyone else? There were supposively over 100,000 G5s ordered, and supposively most of those were duals...
First, no one has officially given this as an excuse for delaying consumer shipments (although it is implied as such).
Second, there is a bit more involved at Apple fulfilling a large bid-contract order like this than in fulfilling 1100 separate G5 orders. I seriously doubt Apple just took 1100 G5's off the line, slapped themn in boxes, and called UPS (or FedEx). There is a huge ammount of support expected by large orderers, and generally given by any supplier (especially Apple).
jettredmont
Sep 5, 2003, 12:52 PM
Originally posted by eddyg
And how is "a special build of Darwin" not OS X. OS X is the entire package, including kernel, not just the GUI.
So if they are running Darwin, then they are running OS X.
Cheers, Edward.
And you can download the Linux kernel, compile it, and suddenly be running Red Hat!
While OS X is based on the Darwin Kernel, the Darwin Kernel is not OS X. Although, one can imagine such vagueries dissapearing in a powerpoint presentation to a bunch of college kids.
Also, in keeping with the Darwin builds only, VT allows the possibility of tweaking the OS itself to run more optimally on their servers. It would be extremely hard to tweak the Darwin-under-Panther OS without NDAs and serious help from Apple.
Finally, as others have said, I can think of nothing that OS X adds on top of Darwin that a cluster would need. You're obviously not going to be doing Quartz compositing on these beasties!
ffakr
Sep 5, 2003, 01:00 PM
Originally posted by jettredmont
And you can download the Linux kernel, compile it, and suddenly be running Red Hat!
Well, no, you'd be running a Linux kernel. Red Had is just a collection of GNU software packages with a Linux Kernel. If you downloaded RedHat, you'd be running RedHat. :-)
Finally, as others have said, I can think of nothing that OS X adds on top of Darwin that a cluster would need. You're obviously not going to be doing Quartz compositing on these beasties!
How about the entire rapid development environment. Xcode would be pretty sweet in a 1100 node cluster with a 20Gbit network fabric. It would bring new meaning to distributed compiles.
Running OS X not only gets you the development tools, but it also get's you access to all of the whole RAD. I'm sure that professors would rather be running code than writting it.
Also, there is no reason why you have to run quartz and aqua on OS X. It's trivial to boot it to a command line an it's trivial to run X-Window in place of Quartz/Aqua.
nilspace
Sep 5, 2003, 01:38 PM
Originally posted by ffakr
The Admin should explain to you that Darwin is not an open-source Kernel. Mach is the kernel in Darwin, which is an Open source OS, which is based off of FreeBSD, which makes up the core of Mac OS X.
Essentially (over simplified):
FreeBSD + Mach + Apple's Mods = Darwin
Darwin + Quartz + Aqua + OpenStep = OS X
You're right. My fault for not specifying this explicitly. But really Darwin is a branch of the Mach kernel. And are you sure that Darwin is based on FreeBSD and not just a forward implementation?
Of course, I remember working on the Mach on a NeXT box (which I still have lying around somewhere).
cbfro
Sep 5, 2003, 02:02 PM
Just in case some of you don't realize this already, VT ordered these computers right after WWDC. Putting them in with the same bunch that ordered that day. They haven't even recieved the machines yet, and won't get any until Saturday. For all we know, they actually placed there order before those of you complaining of delays, and people that ordered after they did could have already gotten their G5.
Myuuchan
Sep 5, 2003, 02:04 PM
Originally posted by jettredmont
While OS X is based on the Darwin Kernel, the Darwin Kernel is not OS X. Although, one can imagine such vagueries dissapearing in a powerpoint presentation to a bunch of college kids.
Uh. That presentation was geared towards faculty, staff, and grad research students in ECE/CS. Total attendence? 30 TOPS, mostly faculty/staff.
uberman42
Sep 5, 2003, 02:43 PM
You say Darwin, I say OS X, you say Mach, I say Darwin; OS X, Darwin, Mach, Darwin, OS X, lets call the whole thing off...or read the FAQ... (http://developer.apple.com/darwin/projects/darwin/faq.html)
Rocketman
Sep 5, 2003, 03:13 PM
Okay, so this whole thread has me thinking like a programmig shop or a rendershop might.
If this 1000 (100 spares) node compute farm (cluster format) for $5.2m is highly practical either with OSX proper, or with YDL, or with Darwin, or with a recompile of Darwin, according to my in-house geekzoids needs, then it follows that perhaps hundreds of small programming, rendering, and other compute shops/companies need a 4 or 8 CPU version of this infiniband/rack/G5 thing like eraly next week.
So what is the PO item listing for say an 8 2x2 G% CPU cluster farm?
Looks to me like something like this:
8 Apple 2x2 G5 CPU's (1 is master)
1 Infiniband router
8 infiniband cables
24 1GB memory sticks
8 Infiniband PCI cards
2 Racks
1 UPS (specify)
1 Liquid cooler system (specify)
8 1000BT back channel cables
1 1000BT router/switch
8 FW800 cables
1 FW800 router
1 NAS box with infiniband, FW800 and dual 1000BT
And the cost for this mess?
Rocketman
AidenShaw
Sep 5, 2003, 03:27 PM
Originally posted by Rocketman
then it follows that perhaps hundreds of small programming, rendering, and other compute shops/companies need a 4 or 8 CPU version of this infiniband/rack/G5 thing like eraly next week
Few would need InfiniBand....
For example, look at SETI@home, Folding, Grid.Org or any of those massively parallel architectures. They work just fine with dial-up modem connections, maybe a little faster with DSL but not much.
The reason is that the time spent transferring the "work units" is a very small fraction of the time spent analyzing it - a couple minutes at 56K compared to at least several hours to process.
Fast low-latency transports tip the equation upside down - you can move quite large amounts of data for relatively short processing times. Instead of a cumbersome "work unit" batch approach, you can use MPI, OpenMP and other parallel programming tools to do very fine-grained distributed processing.
For example, a program loop could be written to send each iteration of the loop to a different CPU to run in parallel. To do this, you need a good network fabric.
For rendering and other more coarse-grained tasks, plain old GigE will be just as good.
ffakr
Sep 5, 2003, 03:54 PM
Originally posted by uberman42
You say Darwin, I say OS X, you say Mach, I say Darwin; OS X, Darwin, Mach, Darwin, OS X, lets call the whole thing off...or read the FAQ... (http://developer.apple.com/darwin/projects/darwin/faq.html)
well, the thing is...
Darwin is an Operating System.
Mach is a Kernel.
Mach is one piece of software whose main function is to control the hardware (and interface it to other software)
Darwin is much more than just a Kernel. It is the sum of a Kernel, drivers, and lots of software.
Each of these terms are supersets containing the component(s) of the previous term.
Mach (kernel) -> Darwin (core OS) -> OS X (Apple's BSD based OS)
I hate to be uptight and picky, but it's inaccurate to refer to Darwin as a kernel. It confuses people.
That's like taking:
sparkplug -> drivetrain -> Car
... and mixing up all the terms. If you are talking about an engine or the whole drivetrain and you use the term sparkplug... well you've really confusing matters.
It gets worse when you are talking about the differences between the performance of a Honda S2000 and a Nissan Z350 and you keep comparing the drivetrain of the Honda to the sparkplugs of the Nissan.
:-) <-- not trying to be a jerk.. just trying to be accurate.
Originally posted by Rocketman
8 Apple 2x2 G5 CPU's (1 is master)
24 1GB memory sticks
Looks like an odd number of DIMMs per box...can't do that...
WM
neilt
Sep 5, 2003, 10:26 PM
Originally posted by AidenShaw
Few would need InfiniBand....
…
This is a quote from an email from the admin of #65 on the Top500.
"The interconnects play a HUGE role in achieving those sorts of numbers. The latency with Gig E is much more than myrinet or any other highspeed interconnect._ I can achieve about 500 G/Flop with the 128 myrinet nodes. It takes over twice that number of Gig/E nodes to produce similar numbers. "
Our cluster has 512 nodes. Only 128 of those 512 have Myrinet (similar, yet slower than infiniband)
So, even though GigE is pleny fast, it really does make a diffence when you are talking about these big clusters. The name of the game is speed. The jobs that we run now take hours and days instead of weeks and months.
neilt
AidenShaw
Sep 5, 2003, 10:35 PM
Originally posted by neilt
So, even though GigE is pleny fast, it really does make a diffence when you are talking about these big clusters.
I agree completely - MPI programming for parallel LINPACKD numbers on thousand node clusters is critically dependent on very fast, very low latency network fabrics.
I was replying to someone who suggested using InfiniBand for an 8 node renderfarm.
That would be a waste of money, don't you agree? With far smaller clusters, and far coarser parallelism - the network is not as critical. And besides, every box in this class on the market already includes GigE on the mobo.... Heck, even my new Dell laptop has GigE on the mobo.
neilt
Sep 6, 2003, 12:54 AM
Originally posted by AidenShaw
I was replying to someone who suggested using InfiniBand for an 8 node renderfarm.
Ah..... sorry about that. Yes you are right, i didn't read the post you were replying too correctly.
neilt
Bifrost
Sep 6, 2003, 06:12 PM
I just wanted to comment on two of the issues being discussed in this thread.
First, there were some questions about whether or not the cluster needed to be homogeneous. I doubt that it HAS to be. But you may not gain any advantage by adding faster machines to the cluster. If the programs that the cluster runs are desigend to break the total job into equal-sized chunks (such as the AltiVec Fractal Carbon program available from www.daugerresearch.com) then a cluster composed of, let's say, 5 fast and 5 slow machines would have a performance equal to a cluster of 10 of the slow machines. On the other hand, if the program is written so that it constantly feeds new jobs to any available processor then you do gain some advantage by adding faster machines to the cluster. Faster processors will complete more jobs in the same time compared to slower processors. Each processor does the most it can to help complete the overall task.
The second issue is the need for a high-speed interconnect. This is also dependent on how the programming is done. If the parallelism is coarse-grained and doesn't require a lot of data to be transmitted then Ethernet is fine. But if the parallelism is fine-scale and each step depends on the result of a previous step then the data needs to fly fast and furious and you better have a high-speed interconnect.
For my (very small) cluster I use a variety of machines and Fast Ethernet because my program feeds jobs as processors become available but each job takes about an hour to complete and requires little data to be transmitted. On the other hand a program that breaks a problem into equal chunks and must transmit a lot of data would be best run on a homogeneous cluster with InfiniBand.
My guess is that VT wants a cluster that can perform well for any type of software. They will likely add faster machines when it becomes economically feasible to do so (i.e. when a 4 GHz dual that costs what the current 2GHz dual does now is out), knowing that some programs can utilize that extra speed and some will effectively see it as just another 2GHz machine. And using InfiniBand gives the researchers using the machine the opportunity to use fine-scale parallelism if they need to. Smart move, really, if you've got the $5.2 million to spend. My cluster was assembled for about $12,000 so it's not quite as nice......
websterphreaky
Sep 6, 2003, 11:35 PM
Have you noticed this rather disturbing trend the last half decade from Apple. PR before Apple Loyalist? PR before "first in line buyers", "OS X will completely support all G3 Macs"
I certainly hope that all the cupertino morning prayers Apple Kool-Aid Drinkers thoroughly enjoy another Apple screwing.
What's that ol saying . . . oh yeah; bend over, grab your ankles and don't forget to smile.:D
jadedchameleon
Sep 7, 2003, 01:17 PM
Originally posted by websterphreaky
Have you noticed this rather disturbing trend the last half decade from Apple. PR before Apple Loyalist? PR before "first in line buyers", "OS X will completely support all G3 Macs"
I certainly hope that all the cupertino morning prayers Apple Kool-Aid Drinkers thoroughly enjoy another Apple screwing.
What's that ol saying . . . oh yeah; bend over, grab your ankles and don't forget to smile.:D
Oh come on. You have absolutely NO idea (nor do ANY of us) exactly when this order was placed. They may have placed it as early as anyone. Besides that fact, almost every company in the world has the idea of "priority customers". You go try to get support from Microsoft or many other major software firms sometime--it's amazing how much more they are willing to accomodate you when you're a major developer instead of joe-hacks-on-the-weekend.
It's not a friggin democracy people, and it's not any better on the other side of the fence so stop trying to color it that way.
Myuuchan
Sep 7, 2003, 04:04 PM
In interesting news, the first 126 were unpacked (and probably racked) today...
... in an hour and a half.
Woot. Go Hokies.
Johnny Mnemonic
Sep 7, 2003, 06:57 PM
As a pro Mac tech in a closeby state (MD), I have to admit that I am obsessed by the details of the development of this cluster.
For instance:
What are the details of this speciazlied cooling arrangement?
What are the detials of the custom built racks? What make them special?
Is each G5 shipping with a keyboard, mouse, and CDs? Do they have the Infiniband PCI pre-installed?
What is going to be done with all that extra HD space? That's a lot of pr0n!
Of perhaps wider interest:
I saw a post from an Mellanox developer on the Darwin Dev mailing list about mid-July. That would lead me to believe that their drivers aren't ready yet.
Are they?
When they are, who will own them--Mellanox, Apple, or VT?
Will they be available for public use? Let's say I wanted to build my own cluster...(and I might, but maybe at 1/10th the size)
Can you tell us more about Deja Vu? How will that work? I saw an interesting post on /. about the use of non-ECC RAM, and what that will mean for errors in a cluster of this size. How will Deaj Vu protect against that kind of issue--namely errors in off-the-shelf equipment?
What is the expected fail rate of these units? How will it be handled? Repair/replace?
Why is Darwin the first choice? Why is Linux the fall-back? Why isn't the cluster understood well enough to know whether or not Darwin will meet the needs already (ie what if the fallback doesn't work, also?)
What kind of monitoring of each node is going to be done?
Let me invite you to post these details not only here, but to a new apple list for clusters: http://lists.apple.com/mailman/listinfo/clusters. Particulary the info on IB--there's interest there about using these drivers, if they're going to be available.
And a shout-out to neilt--you don't know me, but I'm guessing you work for a little outfit called t-gen. Like strangers in the night...
neilt
Sep 7, 2003, 07:45 PM
Originally posted by Johnny Mnemonic
As a pro Mac tech in a closeby state (MD), I have to admit that I am obsessed by the details of the development of this cluster.
And a shout-out to neilt--you don't know me, but I'm guessing you work for a little outfit called t-gen. Like strangers in the night...
Yep. it's a little outfit with a huge amount of processing power! :)
I used to work for NHGRI at the NIH as the lead mac tech, now I run technical support for TGen.
where are you in DC? I used to live at the Constitution and 14th on Capitol Hill
neilt
Myuuchan
Sep 7, 2003, 07:50 PM
I'm afraid I can't answer any questions about the physical details about the cluster until wednesday - i.e. the morning after I spend time in assembly.
themadchemist
Sep 7, 2003, 09:28 PM
Originally posted by Booga
The only acronym for PSC I know, and one that fits with supercomputing, is the Pittsburgh Supercomputing Center, in Pittsburgh, PA. It's one of the original NSF supercomputer centers, operated from Carnegie Mellon University, and one of the reasons the internet's predecessor was created. I believe PSC upgraded through the years from a Cray X-MP to a Y-MP to a T3D, and who knows what they have now.
It's kind of funny and makes a certain amount of sense they'd be involved. CMU must have been one of the biggest customers for NeXT computers back in the day, and has always maintained decent Mac clusters as well as their UNIX and Windows offerings for their students.
oh, Carnegie Mellon is quite pro Macintosh. Two years ago, when all the universities were sending me their brochures, I was considering applying to Carnegie Mellon. I was looking over their application, and there was one part that dealt with submissions of digital media (I think it was for art students, but I was reading it anyway, don't ask me why). Well, it said that they would ONLY accept submissions that could be read on a Mac.
I draw from that that the school has a pretty Macintosh streak in it. It may just be the Artsy people, but I'm not sure. I had the feeling that they were kind of Mac-oriented in general.
niter
Sep 10, 2003, 03:33 PM
Originally posted by iPC
More than half, from what I have heard. At best, this is just a nice excuse. It is good PR for Apple, but it is a lame way of trying to appease all those customers that ordered these machines when they were announced back in June. You would think Apple wants to keep the Mac faithful that buy the newest stuff happy...
I will admit that I am a complete newbie when it comes to poking around in Apple info, but I do know Tech. Even if Apple did not have a committment to work with Educational needs, I think it would be justified for Tech to commandeer a significant amount of any new Apple products based on the fact that Tech supplies most of their computing facilities with brand new Apple products. In that respect, Tech is a VERY loyal customer. If you do not believe me, swing by the Math Emporium where 500 brand new iMacs reside. That computing lab always has the newest Apple computers available. Then you can visit the library and the numerous other computer labs on campus to see the vast array of Apple products. Then look at the Apple computers Tech provides to its professors. I really wonder how many students Tech has made into loyal Apple users. All I know is that if I want to find the newest and latest when it comes to an Apple desktop, all I have to do is walk into one of Tech's computer labs.
Putting all that aside, so what if Apple sells Tech a bunch of Macs for PR? Good for Apple. Their name is getting splashed all over the place and piquing more interest in their products. Education causes, whether for Tech or another university are very noble and I say kudos to Apple for putting education and research first.
nilspace
Sep 11, 2003, 09:54 AM
Originally posted by Johnny Mnemonic
For instance:
What are the details of this speciazlied cooling arrangement?
What are the detials of the custom built racks? What make them special?
Is each G5 shipping with a keyboard, mouse, and CDs? Do they have the Infiniband PCI pre-installed?
Well, I've heard some of the answers to these q's from the SysAdmin:
-> The custom racks are meant to hold 5 G5's in their cases.
-> Each system *is* shipping with keyboard, mouse, etc. (even the styrofoam packaging) all of which is being stored at a warehouse.
Spock
Sep 11, 2003, 10:37 AM
I want to see pictures. I bet in about 5 years we will see these on eBay. Or whatever is going in 5 years
vBulletin® v3.8.6, Copyright ©2000-2012, Jelsoft Enterprises Ltd.