Apple's Struggles With Long-Rumored AR/VR Headset Detailed in New Report

bobcomer · May 18, 2022

DarthBuzzard said:
So it can be a floating arrow for each corner you need to take. There are various ways this can go down.

Like listening to the directions and using my eyes to see everything else. I only have so much visual bandwidth to go around, so splitting what I need between senses, cuts down on the clutter.

The "can't see the forest for the trees" is the perfect literal way to explain the problem.

kc9hzn · May 18, 2022

DarthBuzzard said:
AR is not a heads-up display at all. You are thinking of Google Glass, which is a different type of technology.

Here are some video examples of how AR can edit reality.

https://twitter.com/x/status/1390748451147681795

https://twitter.com/x/status/1309066544924766210

https://twitter.com/x/status/1446059727658520582

https://twitter.com/x/status/1397516018306600962

It’s all still just visual overlays. You could potentially change people’s perceptions of reality, but you’re not changing the underlying physical reality. You take the glasses off, and the changes all vanish.

Before you start advocating for cybernetic enhancement implants, though, there are some major moral and ethical discussions we need to have about such things and the ability for people to literally re-write how potentially non-consenting people see and make sense of the world!

DarthBuzzard · May 18, 2022

kc9hzn said:
Gesturing in space doesn’t sound terribly appealing, though. It’s the same problem vertical touch screens have. They demo well and they’re fine for very brief interactions, but you wouldn’t want to type out a message or edit a photo by reaching out to a physical screen in front of you even with all the feedback the screen gives you. To do the same with a virtual screen seems awful. (Also, what even are the Fitt’s Law attributes of an AR display?)

You might respond to that by saying “well, maybe we shouldn’t expect people to perform gestures on a virtual screen, maybe they should just gesture in free air”. Well, let’s walk through a specific use case, a fairly simple one. Let’s say you want to choose a picture from your photo album. Do we create a fake photo album that you literally thumb through, or do we show a gallery of image thumbnails that you swipe through? The former is certainly very skeuomorphic, but it misses the essential feedback physical items provide. The latter is rather 2D-screen centric, sure, but it’s probably more natural with the gestures available to us. But we’d still need something like inertial scrolling, then we’d need to use fairly fine motor skills to prevent overshooting the photo we want. All the while, you’re gesturing in a space where people might move into. Oh sure, you shouldn’t gesture in a space where people might move into, but, judging on how many people use their smartphones, I don’t think we can necessarily trust people to practice self-restraint when it comes to playing with AR interfaces. Far better would be voice, visual cues, or some method of quasi-intelligence to show just the information that’ll be most useful for the user. AR glasses would actually probably be a fairly passive experience in most non-immersive contexts.

This is why Facebook/Meta are so focused on an EMG wristband as a form of input. Since it can interpret brain signals at the hand, it can represent the tiniest of muscle movements, leading to something like this:

DarthBuzzard · May 18, 2022

kc9hzn said:
It’s all still just visual overlays. You could potentially change people’s perceptions of reality, but you’re not changing the underlying physical reality. You take the glasses off, and the changes all vanish.

Before you start advocating for cybernetic enhancement implants, though, there are some major moral and ethical discussions we need to have about such things and the ability for people to literally re-write how potentially non-consenting people see and make sense of the world!

Yes, but perception is our experience of the world. Which is why it's profound. And because AR can be networked, it can be a shared perceptual experience.

Of course it won't physically change hard concrete or anything like that, but it is still unlike anything we've had.

DarthBuzzard · May 18, 2022

bobcomer said:
Like listening to the directions and using my eyes to see everything else. I only have so much visual bandwidth to go around, so splitting what I need between senses, cuts down on the clutter.

The "can't see the forest for the trees" is the perfect literal way to explain the problem.

How many people are good with audio directions though? We are more visually orientated creatures. Some will be fine with just audio, but a lot of people aren't great with it.

kc9hzn · May 18, 2022

bobcomer said:
Like listening to the directions and using my eyes to see everything else. I only have so much visual bandwidth to go around, so splitting what I need between senses, cuts down on the clutter.

The "can't see the forest for the trees" is the perfect literal way to explain the problem.

That’s actually a very legitimate point, at least vis-a-vis operating automobiles via AR. You could say, “well, it should be visible dots on or above the road showing you the way to travel, a la runway lights”, but it’s very likely that some people will take those dots as permission to drive into obstacles (say, ignoring road work signs that were just put up overnight) or will hyperfocus on the dots and cause accidents. AR navigation is probably best left for foot navigation, perhaps indoors. (Could be useful at a mall, for finding a gate at an airport, for finding the right platform at a train station, or making a bus or train transfer in a public transportation plaza/station.)

kc9hzn · May 18, 2022

DarthBuzzard said:
Yes, but perception is our experience of the world. Which is why it's profound. And because AR can be networked, it can be a shared perceptual experience.

Of course it won't physically change hard concrete or anything like that, but it is still unlike anything we've had.

But, if it’s going to be networked, we need to discuss ethics and morals, though. Because it’s only a matter of time until some jerk puts up AR penises all over the place or neo-nazis put swasticas and Nazi flags all around some synagogue. And you could do some huge targeted harassment via networked AR, especially if there’s a tactile feedback component to it. People are gonna be depraved jerks with this new technology, just like they always are. In the name of harm reduction, we need to make a plan for such things well before they become reality (or even “reality”).

DarthBuzzard · May 18, 2022

kc9hzn said:
But, if it’s going to be networked, we need to discuss ethics and morals, though. Because it’s only a matter of time until some jerk puts up AR penises all over the place or neo-nazis put swasticas and Nazi flags all around some synagogue. And you could do some huge targeted harassment via networked AR, especially if there’s a tactile feedback component to it. People are gonna be depraved jerks with this new technology, just like they always are. In the name of harm reduction, we need to make a plan for such things well before they become reality (or even “reality”).

There are indeed some real concerns here, and it may take some really good object recognition to recognize bad user generated content.

bobcomer · May 18, 2022

DarthBuzzard said:
How many people are good with audio directions though?

As many as drive cars and use their Navs.

DarthBuzzard said:
We are more visually orientated creatures.

I wonder about that. It's more like a generalization that is hard to determine. I'm definitely not visually oriented, so there's at least one. I'd bet that anyone with slightly non standard vision are more like me than not. All those people wearing glasses and contacts, among other vision problems. Did you know that around 20% of the people out there can't see stereoscopically? That would have to be taken into account in AR as it's not a small number of people. I'm one of them too...

RadioHedgeFund · May 18, 2022

4jasontv said:
Exactly. Most directions are currently given for driving, which encourages taking your eyes off the road to see your phone. Out of the car, we have few options for say, finding a particular store at a mall.

Or for the deaf / heard of hearing needing it without translation. Or being able to offer tourism of buildings and natural wonders that can't easily be curated.

It's hard to imagine how games might be transformed with a little technology. Tag, for example, could show a counter for how many times or how long someone has been 'it'.

This is a great example. My SO always wants captions on, and I find them distracting because I read them before the actor says their line.

The ability to quiz yourself as you get more knowledgeable.

Or help people understand how different brands differ. It could make it clear that one product has an ingredient you are trying to avoid something due to allergies (nuts) or health concerns (corn syrup).

I would give anything to be able to hear my grandmother retell a story about her childhood as we drove through town.

True. I was thinking more from a consumer standpoint. Allowing for graphical overlays while at the live event, or when at home practicing with your kid.

And help prevent installing a part upside down...

Exactly. One issue with that tech now is that you have to be in the location you want to add the item to. Imagine being able to recreate the location while looking at the real item in the store. Or being able to simulate how a garden might look with different plants that bloom at different times of the year.

Making life better for the differently abled is a noble ideal and arguably an industrial use ie healthcare.

Making members of the public even more dependent on technology than they already are is not altruistic and borders on sociopathy. Steve Jobs' original ideology of wanting to make the technology sit in the background whilst people gaze upon the magical wonderment is disabling. The reaction to this in recent years has been the STEM/Maker movement and arguably started when Woz left Apple because he could see where they would eventually end up.

Technology should serve to make life easier but it should also encourage the uptake of skills, not the need for them.

DarthBuzzard · May 18, 2022

bobcomer said:
As many as drive cars and use their Navs.

I wonder about that. It's more like a generalization that is hard to determine. I'm definitely not visually oriented, so there's at least one. I'd bet that anyone with slightly non standard vision are more like me than not. All those people wearing glasses and contacts, among other vision problems. Did you know that around 20% of the people out there can't see stereoscopically? That would have to be taken into account in AR as it's not a small number of people. I'm one of them too...

I have a few family members who struggle with audio directions. I mean they get to their destination in the end, but sometimes make a wrong turning or get stressed about it. While it works, it isn't perfect. There are struggles.

When it comes to learning studies, the typical outcome is that most people are visual learners. This isn't the same as applying it to a car, but it gives you an idea that we tend to be visual-orientated.

I will say that AR navigation overlays make more sense on-foot than in a car though. That's just easier to handle.

4jasontv · May 18, 2022

RadioHedgeFund said:
Making members of the public even more dependent on technology than they already are is not altruistic and borders on sociopathy. Steve Jobs' original ideology of wanting to make the technology sit in the background whilst people gaze upon the magical wonderment is disabling. The reaction to this in recent years has been the STEM/Maker movement and arguably started when Woz left Apple because he could see where they would eventually end up.

Help me understand which example that I gave falls under this sociopathy category?

kc9hzn · May 18, 2022

DarthBuzzard said:
There are indeed some real concerns here, and it may take some really good object recognition to recognize bad user generated content.

Considering how poorly penis detection algorithms have worked in sandbox games, I hope you’ll forgive me for not being terribly optimistic about object recognition. It’s easy for humans, but it’s fundamentally a very hard problem for computers, especially since computers lack the context cues that humans have. Humans have a contextual understanding of what things represent and mean that computers lack. After we’ve told a computer what a dog is, it can do a decent job of recognizing dogs. But it doesn’t really understand what a dog is. It’s a little like ascribing attributes to a book (good, bad, sad, funny) and asking a computer to characterize other books based on those. The computer may make decent recommendations, in general, but it doesn’t really understand why the book is good, just that books with x, y, and z characteristics are typically considered “good”, and this book has a blend of characteristics that would make it “good”. It doesn’t really understand the mental processes that lead to humans enjoying the book.

This makes the whole object recognition thing particularly hard for computers. While it could probably learn to recognize chairs, for instance, it might be a lot worse at recognizing the broader category of “phallic-suggesting items”. Is a giant pencil in the category of “phallic-suggesting items”? ¯\_(ツ)_/¯ Possibly? It really depends on context. In a world of oversized books and desks, probably not. In a porno with a woman in a schoolgirl’s outfit? Most likely. Outside of any context or in the ambiguous context of user generated content? It’s likely impossible to tell. Some things in the “phallic suggesting items” category may be more obviously phallic without context than others, but, in say, a Lego game, how do you distinguish Lego brick penises from, say, Lego brick obelisks?

So I’m actually not terribly optimistic about computer vision as a solvable problem, at least as a general use, scalable solution. Computer vision is likely going to be limited to recognizing certain types of things, as opposed to being able to recognize any object you put in front of it. And, in a world as open to user generated content as, say, Second Life (which had its own scripting engine for creating interactive objects whose whole state is user generated), you’d need the general solution.

turbineseaplane · May 18, 2022

SpectatorHere said:
Think about the switch of humanity and computers melding together. We'll live in the OS.

That sounds absolutely overwhelming and awful

I think I'm going to leave that to some future generation

Lounge vibes 05 · May 18, 2022

mikethemartian said:
Maybe in the iPod and iPhone, but it was very easy to switch out the battery in a few minutes on the Mac Book Pro.

Steve Jobs resigned from Apple in August 2011.
Non-user accessible batteries started in the MacBook Pro in 2009.
So… you were saying?

SpectatorHere · May 18, 2022

turbineseaplane said:
That sounds absolutely overwhelming and awful

I think I'm going to leave that to some future generation

I think our ancestors would probably feel like we already went too far. With cell phones we're always online and ready to communicate or keep up with what's going on. Unless you're doing truly manual labor, you're probably using computers in great part to carry out your work. We put in ear pods, turn on the car stereo, or sit in front of our television to enjoy our surroundings. AR is the next leap down this path of evolving into our technology. We are becoming cyborgs whether we like it or not.

SpectatorHere · May 18, 2022

kc9hzn said:
Considering how poorly penis detection algorithms have worked in sandbox games, I hope you’ll forgive me for not being terribly optimistic about object recognition. It’s easy for humans, but it’s fundamentally a very hard problem for computers, especially since computers lack the context cues that humans have. Humans have a contextual understanding of what things represent and mean that computers lack. After we’ve told a computer what a dog is, it can do a decent job of recognizing dogs. But it doesn’t really understand what a dog is. It’s a little like ascribing attributes to a book (good, bad, sad, funny) and asking a computer to characterize other books based on those. The computer may make decent recommendations, in general, but it doesn’t really understand why the book is good, just that books with x, y, and z characteristics are typically considered “good”, and this book has a blend of characteristics that would make it “good”. It doesn’t really understand the mental processes that lead to humans enjoying the book.

This makes the whole object recognition thing particularly hard for computers. While it could probably learn to recognize chairs, for instance, it might be a lot worse at recognizing the broader category of “phallic-suggesting items”. Is a giant pencil in the category of “phallic-suggesting items”? ¯\_(ツ)_/¯ Possibly? It really depends on context. In a world of oversized books and desks, probably not. In a porno with a woman in a schoolgirl’s outfit? Most likely. Outside of any context or in the ambiguous context of user generated content? It’s likely impossible to tell. Some things in the “phallic suggesting items” category may be more obviously phallic without context than others, but, in say, a Lego game, how do you distinguish Lego brick penises from, say, Lego brick obelisks?

So I’m actually not terribly optimistic about computer vision as a solvable problem, at least as a general use, scalable solution. Computer vision is likely going to be limited to recognizing certain types of things, as opposed to being able to recognize any object you put in front of it. And, in a world as open to user generated content as, say, Second Life (which had its own scripting engine for creating interactive objects whose whole state is user generated), you’d need the general solution.

I'm not betting against AI. Everything's going to be networked, and the Cloud AI will be trained on all of it.

I'd guess the AI will be better at object recognition than us within the next decade. Since this probably won't be done locally for awhile, my guess is developing the wireless networking bandwidth that can be accessed efficiently with the wearable glasses/phone will be a big challenge.

SpectatorHere · May 18, 2022

These glasses will use always on video, always capturing video (and audio). Going to be crazy when we we're able to save and store of all of it digitally. Going to have to develop AI just to index our life-feeds. Then, no need to remember anything.

Brave new world incoming if we somehow don't ruin it all with war or climate.

RadioHedgeFund · May 18, 2022

4jasontv said:
Help me understand which example that I gave falls under this sociopathy category?

It’s more Apple’s never ending quest to make its customers totally dependent on the company for all their digital needs. I’m not denying how useful the ecosystem is but there are users out there that cannot and will never escape it. Apple seeks to harm and undermine the long-term skills and knowledge of their customers for their own benefit which is classic sociopathy.

urnotl33t · May 18, 2022

neuropsychguy said:
Being able to understand others when you speak a different language is dumb?

Google AR Glasses with Live Translations Could Change the World

Google announced augmented reality glasses that show real-time transcription and translation of nearby voices, and it could change the world.

nerdist.com

There are thousands of other AR applications, many of which haven't yet been conceived.

VR is not for everyone but it has its uses as well. I'm even starting to build it in to one of my university courses to supplement learning for those who are interested in using it.

Yes that Google thing is nice, but it requires BOTH parties to have it. That's not a market. That's literally half of a market. A "mar", as it were.

urnotl33t · May 18, 2022

NT1440 said:
Still a gimmick yes, but we’re less than 10 years away from having the ability to project images and info onto the real world from what looks like a regular pair of glasses. *That* is going to open up whole new avenues of “ambient” computing.

Personally, I work on cars a lot. The first company that makes a Haynes manual-level AR assistant for working under the hood is going to have customer from me.

Ok I'm intrigued.. I'll follow along...

NT1440 said:
Can’t see where a hose snakes to? Okay, just overlay the schematics so I can see where it *should* go.

Mmm maybe, seems like this would involve X-rays, or worse, as well.. hella dangerous. Still an interesting thought, keep going...

NT1440 said:
Don’t know what size wrenches I’m supposed to use before crawling under the car? No problem, the sizes, lengths, and torque specs will just be there when I look at them.

Uh, it's written right there on the page of the dead-tree book you already have. You said "before crawling under". This is not solved by FooR glasses. Or perhaps WHILE under, it could show you where bolts are and their sizes.. without Hulk-spawning gamma rays. 😆

NT1440 said:
That kind of thing is easily 10+ years away as a platform needs time to build content, but there is massive potential in just industry alone. They’re doing this today with HoloLens, but the tech to cramming it into regular glasses grows tantalizingly closer every year….

Perhaps, but this still isn't "mass market appeal" that Apple needs in order to pay for their magic-in-a-box device.

NT1440 said:
I’m not exited about AR for any of the Social crap (I don’t use social media), it’s all the OTHER use cases that excite me. I hope the Metaverse crashes and burns spectacularly.

Agreed, again.

kc9hzn · May 19, 2022

SpectatorHere said:
I'm not betting against AI. Everything's going to be networked, and the Cloud AI will be trained on all of it.

I'd guess the AI will be better at object recognition than us within the next decade. Since this probably won't be done locally for awhile, my guess is developing the wireless networking bandwidth that can be accessed efficiently with the wearable glasses/phone will be a big challenge.

I suppose I am betting against AI in a few specific ways. I doubt generalized AI is even possible, human level cognition from a machine seems like an utter pipe dream. To be blunt, true human level AI probably requires a biological body. Heck, we don’t understand enough about how our brains even work to be able to hope to replicate it in silicon. It’s probably undesirable, too, as personality and will will presumably follow from it if it’s possible. Ready for the AI uprising? (I’m mostly joking, but there are serious ethical concerns that arise from man-made human level intelligences, such as what are the rights of a general AI?)

Now, for unspecialized AI, I think it’s a bit hubristic to think that we could do better than humans (in a fairly short order of time, least of all 10 years) at tasks that the human brain seems especially geared towards. Object recognition is one of those cognitive functions that operates at faster-than-consciousness speeds in humans. It’s like hearing your own name in a crowd, you don’t have to process it because a part of your brain that operates at speeds faster than general thought, enabling you to react faster than if you had to think “that’s my name, who said my name, let’s turn around, oops they’re talking to someone else”. Instead, it’s [trigger], turn around, realize they’re talking to someone else of the same name. Our brains seem to have dedicated regions for handling those kinds of processing tasks. Also, re: object recognition specifically, and continuing on penis detection, it’s not necessarily useful for a computer to be objectively better at object recognition than humans in a situation like this, it needs to make the same sorts of mental connections a human would. It needs to have generalized AI. There also seem to be legitimate issues with cross-training AIs to do multiple tasks, so the only way to handle that seems to be to run a bunch of neural networks all trained to do different things.

Also, I’ve done enough captchas over the years that I’ve seen Google captchas that have clearly been assigned by an AI, an AI that very clearly got it wrong. “Choose all the X”, then there’s very clearly an image that isn’t X, but I could see how a computer would mistake it for X. Inevitably, I get it wrong the first time because the computer thought that the non-X was an X and I went with the right answer (and not the desired answer). Or captchas where you have to select all of the tiles that contain part of an X. By intuitive geometry, I know that a very small part of X is going to be in this other image, but I got the captcha wrong because the computer doesn’t process that image as containing part of X.

sideshowuniqueuser · May 19, 2022

Abazigal said:
Yeah, I suppose growing the iPhone base to over 1 billion active users, building a formidable ecosystem of hardware, software and services, and making Apple a 2+ trillion company would technically count as “not ruining Apple”.

Not by a long shot.

Exactly. Iterations.

sideshowuniqueuser · May 19, 2022

kc9hzn said:
Gesturing in space doesn’t sound terribly appealing, though. It’s the same problem vertical touch screens have. They demo well and they’re fine for very brief interactions, but you wouldn’t want to type out a message or edit a photo by reaching out to a physical screen in front of you even with all the feedback the screen gives you. To do the same with a virtual screen seems awful. (Also, what even are the Fitt’s Law attributes of an AR display?)

You might respond to that by saying “well, maybe we shouldn’t expect people to perform gestures on a virtual screen, maybe they should just gesture in free air”. Well, let’s walk through a specific use case, a fairly simple one. Let’s say you want to choose a picture from your photo album. Do we create a fake photo album that you literally thumb through, or do we show a gallery of image thumbnails that you swipe through? The former is certainly very skeuomorphic, but it misses the essential feedback physical items provide. The latter is rather 2D-screen centric, sure, but it’s probably more natural with the gestures available to us. But we’d still need something like inertial scrolling, then we’d need to use fairly fine motor skills to prevent overshooting the photo we want. All the while, you’re gesturing in a space where people might move into. Oh sure, you shouldn’t gesture in a space where people might move into, but, judging on how many people use their smartphones, I don’t think we can necessarily trust people to practice self-restraint when it comes to playing with AR interfaces. Far better would be voice, visual cues, or some method of quasi-intelligence to show just the information that’ll be most useful for the user. AR glasses would actually probably be a fairly passive experience in most non-immersive contexts.

Yeah, it sounds horrible. Even tapping on a glass screen with a "keyboard" drawn on it is horrible to me, thus why an iPad will never replace my MBP. I'm just trying to imagine the possibilities.

Abazigal · May 20, 2022

sideshowuniqueuser said:
Exactly. Iterations.

Which I don't feel is in itself a bad thing, considering the current state of the competition.

Apple's Struggles With Long-Rumored AR/VR Headset Detailed in New Report

macrumors 601

macrumors 68000

macrumors regular

macrumors regular

macrumors regular

macrumors 68000

macrumors 68000

macrumors regular

macrumors 601

Cancelled

macrumors regular

Suspended

macrumors 68000

Contributor

macrumors 603

macrumors 6502a

macrumors 6502a

macrumors 6502a

Cancelled

macrumors 6502a

macrumors 6502a

macrumors 68000

macrumors 68040

macrumors 68040

Contributor

Our Staff