How can a 3Ghz Octo core Mac Pro be the same or slower than a 3Ghz Quad Core Mac Pro in some instances? Wouldn't the Octo Mac Pro be faster in everything than the Quad Mac Pro?
No, it can be slower.
Many programs are single-threaded, they work on one thing at a time, sequentially doing the steps of the problem, until the job is done. Sometimes this is due to uncreative programming, often it is due to the nature of the problem.
If you put such a program on an octo-core, one would think that it would be the same speed as a quad - but in fact there's overhead due to the additional cores, and the possibility that the OS will move the program around (effectively flushing the caches and slowing it down).
For example, think of a spreadsheet where the "total" is really the sum of ten other cells, each of which is the sum of an entire column of entries. And, in real life, many of the cells in the columns are sums and/or products of other cells in the same row. You can't calculate the total of the ten cells until *after* you've taken the sum of each of the ten columns. And you can't total the columns until you've totaled the rows.
So, solving the general case of finding out how many rows can be calculated in parallel, sending each of those out to 8 cores, then finding which columns can be done in parallel, sending that out to 8 cores - is very hard.
___
Other jobs are very easy to parallelize, and can easily be written for multiple cores.
An example of this would be an MPEG video encoder. MPEG streams have "key frames", which are complete frames (pictures) that subsequent frames are based on. Frames between key frames are "difference" frames - they contain information about how the current frame is different from those around it (usually much less data than a full frame).
It's common for a video stream to contain a key frame every second or so. It's therefore easy to parallelize - you chop the input stream into one-second chunks, and send them out to as many cores as you have available. In theory, a 3600 core machine could process an hour of video in the same time that it takes it to do one second. (In practice, I/O ruins that simplistic statement.)
Unfortunately, many programs (like video processors) weren't written to handle an arbitrary number of cores. They originally were written for a single core. When dual core (dual CPU) machines first came out - they were hacked to split the job into two pieces. When 4 core systems came out - another hack to handle 4 pieces at once.
What we'll see soon is new versions of some of these hacked to cut the job into 8 pieces. Maybe some will be rewritten to work on an arbitrary number of cores, but hacks are faster and cheaper than elegant redesign.