The problem from a software development viewpoint is that to use multiple cores you have to be able to break your code into parts that can be executed non-sequentially. I.e. there need to be parts where you can say "I need to work out the following things, but these items don't depend on the result of those items, those items don't care about some other items, etc"
Usually that isn't terribly difficult but all the current languages make flagging of separate parts a bit of a hassle. For example, in the traditional C-derived languages you have to explicitly split off a separate thread, do a bunch of stuff in that and then do whatever synchronisation is required to get all your results back together.
There are a lot of things multithreading is really good for though because there are some things that naturally desynchronise. Most Photoshop filters can be threaded because what they do to a pixel in the top left of the image often has nothing whatsoever to do with the data in the bottom right and vice versa. Little things like the save command in most programs are good candidates too. Duplicate the document state in memory then farm it out to a separate thread to write to the disk. Copying in memory is much faster than copying to disk so the user doesn't have to wait to continue working on their document.
But as suggested, the main benefit today is in running programs simultaneously. Your computer spends most of its life with lots of different programs running and it is quite easy for a modern OS to split them between different cores. Launch /Applications/Utilities/Activity Monitor and have a look.
At the minute I have the following running:
Activity Monitor: 2.60% CPU utilisation, 2 threads
mdimport: 0.00% CPU utilisation, 4 threads
iTunes: 8.20% CPU utilitsation, 7 threads
Adium: 0.10% CPU utilisation, 9 threads
AppleSpell: 0.00% CPU utilisation, 1 thread
Mail: 0.00% CPU utilisation, 7 threads
Safari: 0.00% CPU utilisation, 9 threads
iTunes Helper: 0.00% CPU utilisation, 1 thread
Finder: 0.00% CPU utilisation, 3 threads
SystemUIServer: 0.00% CPU utilisation, 2 threads
Dock: 0.00% CPU utilisation, 2 threads
pbs: 0.00% CPU utilisation, 2 threads
loginwindow: 0.00% CPU utilisation, 3 threads
ATSServer: 0.00% CPU utilisation, 2 threads
That is, by my calculations, a total of about 54 threads. Without seeing the internal coding I can't be certain which of those in individual processes can be run simultaneously but there is already a lot of scope for shifting things around between cores should any of the apps actually suddently want more than the total of about 10% CPU that I'm using. I love the Intel transition!