But 720p video streams aren't very heavy. Makes more sense if they want to put the characters into some interactive environment like a video game.You're thinking emojis, not memoji. Two different things.
Everyone is shooting down the idea of memojis, but there's some very good reasons for using that over a video stream. Only "instructions" need to be sent between parties, which can be highly compressed, so super efficient.
Once a 3-D face is loaded, the rest is just orientation and modification instructions, such as winks and smiles.
I applaud Apple for thinking different.