The actual data is likely direct peer-to-peer, but the presence framework and NAT-traversal MAY require the assistance of a common third party (read: Apple) server.
A poor analogy to draw, but not terribly unlike an iChat AV session or Skype session.