Hmm, it'll be interesting to see what Unicode does to deal with the increasing complexity of emoji. As of right now, country flags are stored as two separate characters and changed to a single glyph. Racial modifiers add another character to standard emoji as well.
I propose that eventually, emoji will become so complex that individual bits in each character will be mapped to a specific pixel in an image. I've started prototyping myself, and I've created a few proof-of-concept "bitmaps". You have the Just Pretty Exceptional Graphics, or JPEG for short (but I can't imagine it'll catch on), which allows for a lot of compression. The Perfectly Nice Graphic uses crazy lossless compression and allows for transparency. I've also developed an animated format called the Great Image—you probably don't care, anyway.
I'm just hypothesizing about things to come. Could be wrong.