I asked AMD's Stream Computing team about the possibility of HD2000 and HD3000 series support for OpenCL when the original Snow Leopard specs for OpenCL support came out and they said that OpenCL HD2000 and HD3000 do not and will not support OpenCL due to hardware limitations. They didn't say what the limitation is, but I'm almost certain it has nothing to do with double precision floating point support. For one thing, the nVidia 8000, 9000, and GT100 series don't support DP floats either. Only the ATI HD4000 and nVidia GTX200 series do, which was why I was hoping Apple would go with the HD4670 as the mid-range GPU in the last refresh instead of the 9600M GT and GT100 series. What's more, I'm pretty sure the current OpenCL 1.0 only defines single precision floats and double precision floats are currently an option.
The more likely reason why the HD2000 and HD3000 series don't support OpenCL is that their memory structure is different. nVidia DX10 GPUs define local memory stores in each Streaming Multiprocessor (SM) allowing groups of 8 Stream Processors (SPs) to share data. If I'm not mistaken, the HD2000 and HD3000 series don't have this local memory store between small groups of SP and instead had a data store share by all SPs, which I guess may be more inefficient. OpenCL is reportedly closer to nVidia's CUDA than ATI's CTM so it doesn't surprise my if OpenCL's memory model is closer to nVidia's GPU structure. In any case, ATI seems to agree that this is the way to go, since the HD4000 series has local memory stores between groups of 16 (5-way) SPs. This would explain why the HD4000 series is OpenCL compatible.
As well, the HD4000 series still being slower than nVidia GPUs in OpenCL doesn't surprise me. This has been the case in Folding@home, even when the ATI GPUs are running their own native CTM code. nVidia seems to have spent more effort for designing their GPUs for GPGPU operations since they are increasingly promoting them in competition to CPUs, which nVidia lacks, whereas AMD already makes CPUs so they don't have the same motivation. The 9600M GT being faster than the GT120 is likely, since the GT120 is basically a rebranded 9500GT which is a budget GPU. Apple's $150 price for the GT120 is about 3 times more than the PC version is worth.
On another note, if the 9400M and 9600M GT can both do OpenCL in parallel, which I was hoping for, can the 9600M GT do OpenGL graphics while the 9400M do OpenCL physics? That'd be great for games, since you'd get more realistic physics without sacrificing anything except power and heat since the 9400M would have been doing nothing anyways. This type of parallel GPU usage would be more worthwhile for Apple to focus on since it's more flexible than SLI. Similarly, a Mac Pro could use say a HD4870 for graphics and a second GT120 for physics.
EDIT: I remembered the HD3000 supports double precision floats as well (like the HD4000 and GTX200), but again that doesn't appear to be the reason for the lack of OpenCL support.