I have a similar concern. We need to wait for all the videos/dev sessions on how to create different experiences on visionOS (
https://developer.apple.com/visionos/learn/). If you take a look at some of the video thumbnails it looks like they will demonstrate building more complex 'full 3D with world interaction' style apps instead of these floating windows. What you're describing right now can be accomplished on the iPhone using ARKit and RealityKit but I don't know how complex and detailed those scenes can be.
So why didn't they demo more of what you're describing? I have two theories:
#1: Vision pro is a computer, not a gaming or "3D Experience" device
I think Apple are focused on making people believe that this is a new paradigm of
computing rather than a VR 'experience' headset. VR goggles right now have the reputation of being 3D game focused and AR goggles have a reputation for being shoddy.
Apple correctly identified that waving your arms in the air is not something conducive to a comfortable computing experience so that's not the primary method of interacting with the device. Yes, the device CAN detect your arm and wave around, which is useful for a game, but that's not what people would want to do with a
generalized computing platform so Apple didn't focus on it (it had a brief mention).
We will see 3D-first novel productivity platforms in the future, a lot of them built on visionOS no doubt, but Apple need to make the experience familiar, at least initially, so people understand its utility and can relate it to how they already use their Apple devices. People liked the iPad because it was just a bigger version of what they already loved about their iPhone -- visionOS is certainly a bigger leap than that but to some degree Apple are intelligently leaning on that iPad strategy for the introduction.
They're picking their battles. They're pitching the practicality of the paradigm (infinite canvas, how the experience scales *literally*), not trying to reinvent the process of productivity (introducing a 3D native asset akin to the document for productivity, a new task tracking system) at the same time because that way you run the risk of losing the average person.
#2: Vision Pro can't compete on VR scene fidelity
If you take a look at some of the leading VR games like Half Life Alyx or a detailed racing sim on a headset like Valve Index, you need a powerful discrete GPU to run the games at a sufficiently high FPS and those headsets do not have anywhere near the same resolution as Vision Pro (Vision Pro has approx 5x the number of pixels vs. Valve Index).
So think about running complex 3D environments at 120+ fps on 23 million pixels using an onboard M2, the same chip that ships in a MacBook Air. Oh, and the chip can't get too hot because it's right next to the user's face. Last I checked, the MacBook Air cannot run 3D games very well (outside of simple arcade style games) and its display has less pixels than a single eye in Vision Pro.
Apple probably avoided showing 3D scenes because the device simply can't handle them. I don't think Apple necessarily care about that right now because they don't think gaming and AR gimmicks (physics and projecting stuff onto the world) is how you convert people, Oculus have already tried and failed. After trying to use Magic Leap for productivity and various other VR headsets, I'm inclined to agree.
I can't bloody wait for this thing.