You answered your own question in your post it seems like. To be able to be in another place without having to travel is an end in itself because time is a valuable resource that only increases in value as one gets older (because you have less and less of it).
Zuckerberg's metaverse failed not because he didn't figure out the "what" for people already in the metaverse but because there is a disconnect between what goes on in the metaverse and reality (as you so aptly put it). On the contrary, Facebook succeeded because what goes on on Facebook can have a real and significant impact on a user's reality.
Apple and Meta are approaching VR from the opposite direction but they're heading towards the same destination. Apple's approach is rooted in, as you said, convenience, and by extension, reality. It tries to create a virtual homologue of things that already exist in reality, e.g., Spatial Personas and Personal Voice. Meta is going about it as it went about it with Facebook: create a virtual world → get people into the said virtual world → have them stay there. But ultimately, with enough people in the metaverse, it will start to have a real-world impact as people interact, exchange ideas, and make connections. The same goes for Apple. When there are enough homologous traits of you, e.g., face and voice, to essentially duplicate your presence in the virtual world, people will start to yearn for a "metaverse". It might even arise organically.
When I say the "and then what" needs figuring out, I mean that it's more than just answering the question. As a manager, I think of the next steps as:
1. More completely fleshing out the problem statement.
2. Determining use cases/user stories that fit the problem statement.
3. Scoping the components, project milestones, and success metrics for delivery of the solution.
But the precondition to all this is quantifying the size of the addressable market and its projected growth, in more specific terms than the broad ones I've laid out. This is where the rubber meets the road... How do you finance this project and have it pay dividends in project milestones. What does that look like?
Maybe that doesn't actually look like Vision Pro. Maybe it really begins with taking Task Automation and Siri to another level. Right now, Siri is a virtual assistant, but what if it instead were a virtual extension of one's self? If the most common things you do in a day could be automated, so that your coffee is ready, your playlist for the day is set up, your news and social media feeds are configured to the most relevant things on your mind right now, your pre-read for the 9:30 meeting is on your desktop, annotated with highlights to call your attention to key sections that impact decisions you have to make, and your reminders actually execute the tasks from paying your water bill to scheduling an Amazon return, all by the time you wake up? How many hours of time has that freed up for you?
Graphical virtualization of the self is far less compelling if it doesn't come with the ability to get the work of living done so you can spend more time actually living, instead of the running dystopian joke about teaching an A.I. to paint for you so you can spend more time working... and by then you start to see the graphical virtualization of the self is completely superfluous here. It does not offer any advantage. Our presence is not just in how we appear, but in what we do. I already don't turn my camera on for 99% of the meetings I'm in.
This is just an example of how the roadmap might actually begin... With capacitance sensing multitouch it actually began quietly, on the Magic Mouse... and as people acclimated to the idea in one corner of their lives, and it proved to be of value (by eliminating moving parts that wear down), it made the transition to more sophisticated implementations of multitouch more palatable.
That's the thing Apple does so well most people don't even realize it... At one point, even Apple didn't realize it. Remember HyperCard? It was HTTP before HTTP, except John Sculley never understood that.