From Apple's own Snow Leopard Page:
"Grand Central, a new set of technologies built into Snow Leopard, brings unrivaled support for multicore systems to Mac OS X. More cores, not faster clock speeds, drive performance increases in todays processors. Grand Central takes full advantage by making all of Mac OS X multicore aware and optimizing it for allocating tasks across multiple cores and processors. Grand Central also makes it much easier for developers to create programs that squeeze every last drop of power from multicore systems."
So it takes developer effort: it's not automatic at an application level.
From reading information on other OS's, such as the Linux kernel, it is apparently complicated. At a basic level, multiple cores can be treated as separate CPUs, but when it gets into the OS, the OS runs on a big lock. So multiple calls from separate processes get blocked, even when they shouldn't be. Linux has had THIS problem pretty much fixed for years, and so would, I assume, Mac OSX. The added complexity arises when you start trying to apportion processes and threads to the available cores. They are NOT all created equal for this, ranging from the Multi threading in the P4 to more modern upcoming systems with 16 threads over 8 cores over 2 chips expected on the upcoming MP.
I would assume that, for example, 2 threads on a core probably share more resources than 2 threads over 2 cores. Multiple cores over a physical CPU also share resources, like CPU cache. So you probably don't want 2 threads both doing extensive floating point processing sharing a core, even if at a real basic level the OS could treat them as independent cores (logically). Contention for cache, on chip resources, etc. The P4 could gain some performance improvement in many cases, but contention was bad enough that it wasn't linear.
So, ideally, 2 completely separate processes with both accessing a lot of memory should probably be on separate physical CPUs. 2 or more threads from within the same process, can be scheduled on the same core, same CPU, etc, since they will be using the same address space, allowing reuse of on chip TLB caches, etc.
The OS also doesn't want to unnecessarily move processes from one core to another core or CPU, since it causes CPU caches to be invalidated.
Therefore, it is likely that Apple developers have been working on improving scheduling dramatically, coming up with updated APIs that allow applications to better decompose applications for a multicore world (and provide hints to the OS about how the application is developed), and maybe provide APIs to allow an application to say it wants to work on data in RAM, please generate as many threads as is appropriate based on number of CPUs, available resources, etc to handle it in parallel. You may not want to hard code 16 threads if you are running on an Intel Core 2 Duo, but you may not want to hard code 2 threads if there are 6 cores currently unused on your system.
Apple has probably went through some system routines too to use these new APIs. Maybe not the common Unix routines like cp, but Finder, Mail, etc, that can use multiple cores for some functionality. cp, for example, doesn't seem like a routine that is too useful multithreaded. It would generate a lot of overhead to make a read and write thread, and have to copy data between them. Though, for example, I guess you COULD spawn off 3 threads for the following command "cp file1 file2 file3 dest_dir/". The downside, though, is semantics of how it works if an error occurs, so I doubt it.