I don't buy this argument (as a developer). video playback is taking data, decoding it, and writing out the video data to a point in memory. That's the long and short of it. The key difference between its own window and in a browser window is that the browser window gives me a different location to draw it... or the browser may just be doing it for me.
Now, it's more work to include overlays and that sort of thing on top of video (something both Flash and QT does). Flash pays more to do that because of the nature of Flash. It's more akin to a Java VM than it is native code like Quicktime. Even though the video decoding can be run native (and must), the rest of the overlays and everything else added on top isn't. That's where the extra CPU load comes from, in the end if we assume the two decoders are equal. They may or may not actually be equal in practice, due to Apple's hardware-assisted decoders versus Flash not always being able to use them to full effect in OS X.