It's performance per watt
the current high end video cards draw too much power, upto 235w for the new nvidia gt200 gpus.
A high end GPU may require 3x the power (wattage) of a mid-high end cpu but a GPU has 10-20x the performance of a mid-range multi-purpose CPU.
Video cards today can push 500GigFlops of single precision FP work.
A very high end Quad Core Xeon 3GHz chip benched at 81GFlops in Linpack:
http://www.intel.com/performance/server/xeon/hpcapp.htm
I've got the 2.8s in my XServe. They're fast.
Sure GPUs run hot.. but they don't as hot as 6+ 3Gig Xeons (a 120W part).
That all said..
GPUs do a few things REALLY well. Some things they simply don't do at all.
Right now, not all of the GPUs (most) don't support double-precision floating point. They might not support the same IEEE standards for FP either. it'd probably be bad if your CPU and your GPU rounded FP numbers differently when you use them for parts of the same calculation. ;-)
GPU streams also don't do a good job talking to each other so there are issues with scheduling.
I'm glad to hear that ATI is on board already. They've got double-precision FP support in some GPUs already. They even sell a computation card that uses GPUs (though it doesn't appear to be for sale anywhere).
http://ati.amd.com/products/streamprocessor/specs.html
As I understand it, OpenCL will be an abstraction layer for programmable GPUs. There should be no reason why it couldn't also support other math processors, like gaming Physics chips. The big vendor (in that small market) was recently purchased by Nvidia.

Physics chips are designed for a different set of tasks, and they run cooler. I'd love to see a card with a mess-o physics chips on it.
http://www.nvidia.com/object/nvidia_physx.html
GPUs are essentially scores, soon to be hundreds, of relatively discrete processing units. The current high-end GPUs have over a hundred "streams".
These are designed to run in parallel and therefore they can get a phenomenal amount of work done in a given time.
If you task can be broken up into lots of little pieces AND if your task requires an acceptable set of math functions, it'll scream on your OpenCL system. You can literally get the performance of a half-rack of cluster from your desktop [likely more due to lower latency].. certainly more if you put in a several video cards.
Most common applications won't see a lick of OpenCL acceleration. We should (hopefully) see some acceleration in apps that apply filters to digital streams. I could potentially see GarageBand, iPhoto, and iMovie running like 'a bat out of hell'. They're kind of made for this type of thing.
Where OpenCL should really pay off is in custom apps, like Computational Fluid Dynamics programs that someone like NASA might use to design a replacement for the Space Shuttle. OpenCL will probably be a huge boon for tasks like protein folding. Professional 3d rendering will likely get much faster.
None of this is new stuff mind you. A lot of people have been doing GPU programming for a long time ( I know some of them ). There are commercial apps out that do GPU programming already (not MS Office and the like.. but usually professional apps in Vertical markets like medical imaging and such).
OpenCL is cool because it's GPU programming for the masses. It's really the first time consumers can see this kind of benefit because it'll be much easier .. Way easier for developers to wrap OpenCL acceleration into Apps. In particular, Apple technologies will be OpenCL accelerated. If you embed something like Quartz Composer or if you leverage Quartz Extreme in some way.. it should just be faster. (this all depends on which Apple technologies lend themselves to OpenCL.. I'm not saying either of the previous will be accelerated but they're good candidates)
