If you are zoomed into a larger, stationary clip, you can add your text on top if it and then keyframe the motion of the text and the video clip by the exact same amount. I assume this is what dorsal meant with "panning across". It'll look rock solid because, well, it is.
Otherwise, if the camera motion of the object is already in your original clip, you can keyframe the text position in FCP X, but it's very hard to make it look as good and as smooth as in the reference video. Apple Motion has a match-move feature that does this very well. Here is one of many, many tutorial videos that show how to use that: