You're still moving around those same blocks because because the computer has to still determine which pixels to turn on and off. A scaled image is an image, it doesn't matter to the computer if you two pixels are logically joined, because it still has to figure out what pixels to change.
No, its not the whole picture. To produce the final output, OS X will first draw the image to the 2x2 backing buffer and then downsample the backing buffer (or the affected portion) to the frame buffer. The frame buffer is always 2880x1800 (lets assume the 15" model for the sake of simplicity). Providing you want to draw an image that is 100x100 points in size. On a "best for retina" setting, you just need to draw it to a 200x200 destination in the 2880x1800 backing buffer, which has the same resolution as the framebuffer so you are done here. But for example, on a "max resolution" setting, you need to draw it to a 200x200 destination in the 3840x2400 backing buffer, and subsequently downsample it to a 150x150 destination in the framebuffer (thats one additional resampling operation on 22k pixels). For a "1680x1050" setting, you are drawing to a 3360x2100 backing buffer and then resampling to a 171x171 destination in the framebuffer (29k pixels to be resampled).
Which is funny, because it means that the operation is actually cheaper when drawing to a higher resolution target

Then again, it makes sense, as images appear "smaller" on higher resolutions and thus cover less actual pixel area. Conversely, working with higher resolutions would mean that you probably redraw larger areas more frequently (e.g. when watching a video etc.), which would cost more in return. E.g. drawing a full frame on 1920x1200 setting means completely filling the 3840x2400 backing buffer and then converting to 2880x1800 — thats 14.4 million ROPs. With a 1680x1050 setting, thats "just" 12.2 million ROPs.
Actual displaying of the image on the screen is "free" in respect to these considerations, because it is always the same, no matter which resolution is used. The final image resolution is always 2880x1800.
P.S. BTW, most of this is my speculation. I assume that OS X maintains a separate framebuffer and a backing buffer. But they could also use a build-in hardware resolution converter and just use the backing buffer as a framebuffer. In that case the conversion would be much cheaper.