I tried Compressor too, it was also "slow" and not using all cores....
A few minor versions ago I checked Compressor export speed on 1080p/30 H.264, and using single-pass it was as fast as FCP X. Now I just spent an hour trying many different Compressor export options, and for some reason it's no longer that fast. I don't know if FCP got faster or Compressor got slower, or if I'm doing something different.
This is very important for anyone transcoding a lot of material because the performance differences can be huge depending on the options chosen. For a 2 min. test video, I got the below export times. Note Handbrake to MPEG-4 was fastest of all, despite it supposedly not using Quick Sync.
Input file: 2 min 1080p/30 .mp4 file produced by FCP X, original material: Canon 5D3 1080p/30 IPB codec.
Handbrake, codec=MPEG-4, CBR, QP=30: 0:21
FCP X, master file, 1080p, H.264 "faster encode": 0:33
Handbrake, codec=H.264 (x264): 0:46
Compressor H.264 1080p custom preset, single pass H.264: 1:13
FCP X, master file, 1080p, H.264 "better quality: 5:38
Compressor H.264 1080p custom preset, multi-pass: 6:20
Re not using all cores, you mean not all 8 virtual cores on a hyperthreaded 4-core CPU?
There are cases where running 8 threads on such a CPU will cause substandard performance because two CPU-bound threads are competing for each real core and causing "cache thrashing". Whether this happens depends on the exact characteristics of those threads.
I think the OS X thread scheduler is hyperthread-aware and may be intentionally scheduling threads on alternate virtual cores in this specific case.
In *each* of my above tests on FCP/Compressor, iStat Menus showed alternate virtual cores were used. So CPU activity was about the same, but render time varied greatly. Handbrake showed all 8 threads heavily used.
In general GPU-accelerated rendering should not be faster than Quick Sync. While Quick Sync requires the on-chip HD graphics, it's not really GPU assisted. Rather the hardware resources QS needs are tied into the on-chip GPU. Quick Sync is essentially an on-chip custom ASIC designed specifically for transcoding. It only works for single-pass MPEG-2 and H.264 but for those cases it's generally faster than software or GPU-assisted methods. The problem is software usually doesn't indicate whether QS is being used, and if incompatible parameters are chosen, the task silently can revert to software-based rendering. I can't explain why Handbrake is so fast to MPEG-4, unless maybe it's using QS. Windows Handbrake supposedly does, OS X Handbrake supposedly does not, but who knows?
Larry Jordan article on using Quick Sync in Compressor:
http://www.larryjordan.biz/compressor-4-1-hardware-acceleration/