Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
This is a "classic" problem that has been beaten to death in the text books.
You seriously need to re-think and research the algorithm you are using. It
can be done without so much RAM.

Basically you are searching a large "tree". Like i said this problem has been
solved and analysed and re-hash and writen about more times then anyone
can count You do not need so much RAM.

In the worst case you can implement your own memory paging system and
move parts of the search tree in to RAM as required. The good thing is that
RAM access is 100% predictable so custom written paging can work well.
If you use asynchronus I/O you don't block while waiting on a the disk.

There is no need for so much RAM this has been done decades ago using
computers that were not able to address so much RAM

The reason you might need so much RAM is for some task were you can
not predict what memory is needed to by accessed next. For example a
DBMS can'r know what the next query will be. But this a a very predictable
"tree walk" and you should be able to swap in sub trees before they
are needed.

In fact if you have a fixed budget you would be best to spend it on CPU
power before you spend it on RAM.

If you do some research, you will see I have written at least one of the papers on the subject matter.

In this case, the sum of all of the disk accesses using a buffer less than 32 GB of RAM will require 20+ computer years to solve the problem.

Yes, I could solve it with a 128K original Mac, but it would take a million years.

Yes, I could solve it with 1 GB of RAM and a paging system that so many people think they need to point out to me, but it would take 50 years to solve.

Disk access is 10,000 times (or more) slower than RAM access. For a problem of this size, all of those trillions and trillions of disk reads, as small as each one is to load it into the buffer and "page out" the least-recently-seen nodes, readly scales the solution time up dramatically.

Put another way: A trillion times a small number is still a big number.

A billion times the same small number is still 1000 times smaller, and therefore 1000 times faster.

I can live with hundreds of billions of disk reads, but not trillions of disk reads.
 
Just an idea...

Ask an university which has a research computer array made up of xserves to compile your database for you. Might be able to cut it down to a few months with an array of 1100+ machines each running 4+gigs (preferably 8) of ram (running on their downtime). If you have 1000 machines, trillion level disk access is no longer an issue if spread out.

http://www.tcf.vt.edu/alloc_policies.html
 
Maybe the National Science Foundation would take an academic interest in this project? One of the perks of some of their fellowships is supercomputer time.
 
I was wondering if you could make use of USB flash drives in some way. You can get 16GB for under £100 now. Not as fast as RAM but much faster seek times than disks etc.

b e n
 
USB 2.0 is very slow.
The interface of a SATA solid state drive is faster, but if it's flash it would be slow to write to and limited to 1,000,000 times (even if this database is not written often, this could be a problem for some pages if you're setting it up as your swap device). Some 160, 256, and 512GB disks have been announced, but they will be very expensive too.
If it's a RAM SSD it makes even less sense, as you could just buy real RAM if his buffering works so well.
 
Hi Cube
I was thinking if he's maxed out the Mac with 32G of RAM then perhaps he might still be able to make use of a fast (as in seek) large disc buffer. Not knowing how the db is implemented this may or may not be feasable. I have seen iRam drives by Gigabyte, they are SATA compatible and use cheap DDR ram… but they can only support a maximum of 4GB unfortunately. Any way it was just a suggestion… probably not worth considering really!

b e n
 
This. is. incredible. Nice work. If you could lease some time on a Supercomputer cluster or something... to me that just makes a lot more sense for your application. I'm not an expert here, but meh. My 2¢.
 
Interesting.

Depressing that so many experts keep trying to tell you how to do your job, though.
 
GothicChess, you might want to have a word with Gigabyte in regards to their upcoming i-RAM 2.0: http://www.techreport.com/onearticle.x/10116

SATA2, up to 8GB of DDR2 memory per drive. Since the Mac Pro has 6 SATA ports, you could theoretically get up to 48GB of storage although. You could maybe use two standard big hard drives in a striped raid for the OS and your general data storage and then have four of those, 32GB worth in a massive striped raid array, the bandwidth would be incredible. You could program your software to use the 32GB DDR2 disk array as a sort of cache. That could greatly reduce the hard drive access delay. Admittedly, two of those i-RAMs would have to remain outside of the case since there are only two 5.25" bays, however it could be worth it. Or you could scale it up as much as you want, filling the PCIe slots with eSATA controllers and putting those i-RAMs in eSATA cases all raided up you could have what, 3 eSATA cards with 4 eSATA connections each giving a theoretical massive raid array of 96GB.
 
interesting thread for sure. the sheer size of the problem lends itself to grid/parallel very nicely. a boat load of ram (clean code or not), or lots of computers seems the only practical approach for getting a semi-quick answer.

spending lots to buy the top end hardware seems the most direct, but thought i would mention Sun's GRID for rent solution also since sun hardware has come up in the thread. you buy cpu time at whatever dollar/cpu size fits your problem. might offer a faster solution, but don't know how cost effective it would be compared to just buying computers outright.

best of luck with the project.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.