Syncing is quite trivial to do with the sync objects. Same goes for your comment to reusing data from the previous frame - as you would share all the objects between your contexts anyway this is also quite trivial.
I'd shy away from telling people things are "trivial". I don't think you meant it that way but it sounded very dismissive. I don't code on Feral games very much being more production based in recent years but I do spend time a lot of working with driver teams at AMD plus AppleGL etc. After looking at this problem a few times (first time I discussed this as a possible option was with the G4 Macs years ago so I have thought about it a few times now) it is possible but hardly trivial. Your example is a very good example of a simple game engine that you could use to test multiple GPUs. When you get a complex modern game with physics, multiple pass rendering, dynamic shader generation with changing variables the syncing problem suddenly gets more complex.
The best generic option I had thought of that avoids these issues would be to look at rendering half the frame on each card and DMAing one half to the other card before final render to screen. However the best method would vary from game to game depending on the game engine and I would need a MacPro and some spare time to test various methods and ideas before stating what is the best way of doing it as reality and theory often are miles apart.
as only one GPU can output the frame, you need to render to a FBO on the inactive one and then copy it to the main framebuffer on the active GPU.
Actually that has some unwanted overhead based on a few tests, to get the best performance you can/should DMA the memory between the cards but without a MacPro I can't be sure the DMA method works as expected. However this is likely how Apple get their performance in FCP.
But its certainly doable. And it is probably better than what the driver can do - after all, you know best how you do your rendering and can adjust your syncing appropriately.
I never said it was not doable, in fact I said the opposite
I did mention on the PC the game drivers do the work for you you don't have to write it yourself inside the game engine(s). They do this for a reason as doing it in game by the developers is more complex and more work. Having it in the drivers makes it a lot easier for everyone as the game does not need to do all the shuffling the drivers can do it, it might not be quite as fast as a hand written and optimised solution per game but it will work on all games.
this whole story rises and falls with how well Apple's OpenGL supports shared contexts
Partly you also have to take into account the game engines, the problem is not writing a game engine from scratch that supports multiple GPUs but taking a very complex game engine that does not support multiple GPUs and modifying it to work with them. That is harder as many engineering decisions have already been made before you start and plenty of them are serial in nature.
Some things like rendering on one card and displaying on another can be done and I have seen Feral games do this when testing various library and OS X features like window mode. If it was all down to a few simple render commands and DMAing the final frame over then it would be fairly trivial but that is the easy bit.
I said in my post I am sure you will be able to get gains but as the potential benefits for your average Mac user are not that high (as the main benefit is on a rare super high performance Mac) there are much more productive speed boosts to be found in other areas. The idea of using the iGPU is one I have looked at back when the HD3000 was bundled in MBP machines but the overhead and weak iGPU performance looked to outweigh the benefits in using two GPUs. With the new HD5000 cards and the improved Intel drivers in Mavericks it might be worth another look once other performance boosts that effect all users are completed.
From programmer's standpoint, it is exactly the same thing. The code is the same (except performance tweaks). The only difference is the performance characteristics. You can certainly use the iGPU to prototype a multi-GPU rendering engine.
From a programmers standpoint optimising for a Playstation and a Mac are exactly the same thing in theory.

However in reality it's different. Same thing with your example, in theory it's the same and uses the same code. In reality getting it to give you a usable performance advantage with correct rendering is another.
You can certainly use the iGPU to prototype a multi-GPU rendering engine.
Yep, converting a non multi-GPU rendering engine into a multi-GPU rendering engine without breaking or rewriting the game is a different and more complex problem to solve.
I know this post has been mostly highlighting a few issues and problems with your suggestions but I am not dismissing your views or the very useful commentary for others reading this thread. I am glad someone has done some reading up and raised a few questions. My aim is to highlight although designing a game engine to support multi GPUs is not a huge problem (it's still more complex than your trivial suggests) converting a game engine that is not designed to support multiple GPUs to one that can without any of the driver tools available on other platforms that assist with this is a much more complex and harder to solve problem than making a game engine from scratch.
What's more as the benefits would only boost users with already super powerful machines it is a lower priority on developer time. The time spent on making a MacPro (or even a MacBookPro) faster will not benefit users with the lower performance Macs (or single GPUs). Making the fast machines faster at the expense of the slower ones is not a good decision so in terms of allocating programmer time. This is why multi GPUs historically have been on the "if we have time" list not the "important to fix before release" list. We constantly evaluate the situation but that's where I am coming from from a games developers/publishers standpoint.
Edwin