Facebook messenger does something like this much better. It’s still pretty useless but the Apple implementation feels like it struggles to make the simplest thing. My brother just broke his arm so I tried to make him an emoji with a broken arm. Apple failed to do anything useful and kept dressing him up like a doctor. It’s utterly stupid. I don’t see ever really trying it much again at this point.
There are multiple elements to this functionality. At the very least we have
1. the language model (translates the phrase into a "latent space")
2. the image generation model(s) (at least two of them, one cartoon like, one "Illustrator" like)
3. constraints on the language model so it doesn't offend anyone ever
4. constraints on both models to limit how long generation takes and how much energy it uses
In my experience the biggest problem that can be fixed soon is 1. The language model is just too limited and dumb, and clearly way simpler than the "intrinsic" Apple language model (which can handle, eg, translation adequately).
2. should be improved (more options for image style) simply because doing so is fairly easy, and would make happy many of the people who are irritated by the cartoony style
3. is (I suspect) hurting a lot, including making 1. extra dumb. But this is the world we inherited over the past decade. I suspect US elite society will tone this down over the next decade, but religious movements change slower than one might want.
4. is I think the weakest constraint. It will be improved in SW every few months and HW every year, and doesn't worry me.
So that's my expectation:
- more image styles soon (maybe as early as 18.3. Certainly if you could include two people, that would also be a significant boost in functionality, but that may have to wait till 19)
- better language model soon (but maybe not till 19)
- a long slow whittling away at the limitations of what can be done, probably led by Grok, w/ Claude and OpenAI next, Google behind them, and Apple last of all
- on-going tweaks to make image quality better (at the cost of more compute) but changing so slowly that there's never an obvious jump
Another direction Apple may go (and we may see this as early as 19) is APIs that allow third parties to act as the preferred genmoji and Image Playground creators. This seems plausible both
on technical grounds (Apple looks better if, in response to complaints that their image gen tech sucks, they can say "if it's important to you, switch to using <App X> or <OpenAI> or whatever") and
on legal grounds (given the EU kerfuffle, it seems stupid not to provide obvious hooks for companies that are obviously looking for ways to become part of the system)...