Of course, the devil is in the details as to how one finds an 'anchor point'.
The way it was explained to me was that each frame undergoes a calculation using
Fourier Transforms, and the FT value is basically what's used for this value; the differences in the frame-to-frame value of the FT calculuation is indicative of how much the 2nd frame shifted vs frame #1 (think simple subtraction to get the stabilization offset). Then frame #3 can be done, #4, #5, etc.
To illustrate by analogy, envison one's frame of view for an image to be one playing card...the King of Clubs.
Take the King of Clubs (Card #1) and now place the Queen of Diamonds (Card #2 - yes, the next frame of video) on top of #1 ... randomly.
This randomness is your "shake". Thinking in geometric terms, we can quantify the misallignment of the two cards as an (X, Y) offset & a rotation.
Now if we visually look at the cards, we can see part of the King sticking out from under the Queen - that's the misalligment, so what we will end up doing to make a stabilized view is that we'll pull out a knife and cut (think of this as a "crop") away any part of card #1 (the King) that you see sticking out from under #2 (Queen) ... but we're only halfway: flip the stack over, and now cut away any part of card #2 (the Queen) that you see sticking out from under the previously-trimmed card #1 (King). Now, you have two pieces of cards that are identically shaped, but neither which is fully whole anymore. Congratulations, you now have a stabilized two frame movie.
Now toss on the Jack of Spades (Card#3) and cut again, both sides. Then the Ace of Hearts (Card #4), cut both sides again, etc.
Now take a look at your four card stack ... its probably smaller than the first two cards, isn't it?
Plus, you've also lost your 4:3 (or 16:9) ratio, so you can recognize that more cropping will be required to get back to your 4:3 (or 16:9) ratio will be an eventuality. That will be done at the end, to minimize data loss.
This is where the very basics end and the post processing becomes even more interesting - - how does one want to proceed without the field of view becoming increasingly smaller and smaller? One choice is to use a moving average on these stacks of images, which is effectively saying that we permit the center (X,Y) position of the view shift slowly across the video. Unfortunately, the analogy of an exacto knife won't cut it anymore.
Okay, time to get a fresh deck of playing cards. The deck of 52 playing cards that represents 52 frames of a video. If it is a perfect stack, then the image has no shake - every location on the top card has the other 51 cards underneath it, so any stabilization "crop" would be zero.
Now manipulate the card deck in a couple of different ways ... but make them smooth, such as the whole deck on a nice even slant at a 45 degree angle: you can envision your knife cut as being vertical slice, which would take a good chunk out of your deck on both ends ... but on the other hand, if you allowed a 1/16" offset between each card, you might not have to cut them down at all ... the knife cut analogy would be a slant. If you go a step further, you can see that this 1/16 " offset could be a mathmatical moving average ... the knife cut could be vertical when it needs to be, and then slant, etc. There's a lot of mathmatical tricks that one can apply.
But wait, there's more: there's also a pretty common way to cheat outright, which is to "recreate" (or if you prefer, "fake") the missing data in a frame. Basically, if a frame has the missing data that would cause a serious chop, one can look at nearby frames (not too temporally distant - - ie, not too many cards away in our deck), and then copy in that other data to fill in for the missing data (...and miniize the "crop") with the catch being that you will need to hope that nobody notices that the addition is data added from some other time.
A stand-alone video stabilizer that's I've tried that's probably worth trying out (they have a free demo) is
iStabilize. FWIW, I don't know specifically how this stabilizer works or what tricks it may use (more might be in their documentation), but I do recall a pretty interesting UI where it kind of showed you what it was going to return from an analyzed clip, and it offered some different settings to play with.
-hh