Drivers are actually the least significant factor here. Driver performance matters in 3D applications, where internal driver inefficiencies in implementing the complex API can severely reduce the performance.
Desktop compositing uses only a small subset of the GPU features and does not involve thousands API calls per second as usual 3d application do. And Apple has its own optimisations to ensure that the desktop compositor has very fast access to video RAM (I do hope that these are no broken).
The culprit is most likely the combination of some inefficiencies of Apple's own HiDPI implementation + bad application code. Take the App Store as the prime example for a really sluggish app on the retina MBP. The problem here is that it handles resize actions in a very inefficient way (I have no idea what they do, but they seem to recalculate/redraw the whole view multiple times when resized). It is actually already sluggish at non HiDPI mode - but it only becomes apparent with HiDPI, where it has to render 4x pixels.
Basically, the hardware has undergo some incredible advancements in the last few years, so many programmers became lazy. Its insane how much resources some applications need to perform really mundane tasks.
P.S. and yes, the HD 4000 has enough horsepower/bandwidth/fillrate for retina, as I have pointed out in multiple threads already. Its benchmarked pixel fillrate is way above 1Gpixel/sec while even 15" retina has 'only' 5 megapixels. This basically means that the card is easily capable of more than 150 full-screen retina updates per second. On practice, only the modified display regions ever get updated, so you usually need only a fraction of that power. The only time it gets 'narrow' is when you have constant large-scale updates, like in movies (less of a problem, as we usually need less then 30 fps here) or games.