Apple Intelligence Not Trained on YouTube Content, Says Apple

Bogstandard · Jul 18, 2024

Thank the stars.
Can you imagine all those silly, gawping eyes wide de fluencers?

AbSoluTc · Jul 18, 2024

Since iOS 17, Siri is a hot mess. Ever since they allowed just "siri" to activate, I have to constantly repeat my request or YELL IT. Previously, Siri would hear me across the room or from a whisper. No longer the case. I've also noticed if I add "hey" back to the command, Siris is more responsive. Lastly, when I say "Siri", it would be nice if it didn't go to the persons phone sitting next to me while they are doing other things! So damn annoying. Why does it do that? Why does it not default to the homepod next to me?

citysnaps · Jul 18, 2024

ifxf said:
Not everything available to a web crawler can be used for AI training purposes. I doubt Apple reads the license agreements on any of the websites it crawls.

Apple's legal department would demand they'd be read.

anthogag · Jul 18, 2024

That's good. YouTube content is junk food for AI.

Mousse · Jul 18, 2024

Isn't YouTube subtitles AI generated? I have to turn CC on because of the gawd awful muzak🤮 these content creators keep piping in the background.😒 It feels like I have aphasia trying to understand the YouTube captioning.😑

sunny5 · Jul 18, 2024

obviouslogic said:
I guess if you educated yourself, you’d know that Apple Intelligence is mainly centered around you and your devices, not “worldly” data as they’ve stated and the reason why they’ve decided to add 3rd party LLM services as optional services.

The web is considered ”public” and reading from it, i.e. gaining knowledge is the entire point of it. When you read articles from this site and learn something, did you pay MacRumors for the knowledge? Do you keep that knowledge to yourself or do you tell other people about it? Do you give them the link so MacRumors get can the ad revenue?

Apple as with EVERY search engine on the planet uses web crawlers to “read” and index websites and the data from them… if that site owner does not want their data mined, there’s a standard method (robots.txt) every web admin should know about to restrict crawlers.

It's very clear that you dont know how AI works. Without training AI, it wont work even if it's only for your devices. How do they even get mass datas to train with?

darkslide29 · Jul 18, 2024

I’d like to see an AI model that is solely trained with the YouTube comments section. What could go wrong?

Col4bin · Jul 18, 2024

Please also keep Apple AI far, far away from Reddit as well.

Mother Nature · Jul 18, 2024

mazz0 said:
How do they have access to train even their open source model on youTube subtitles? Aren't those copyright of the people who made the video? And even if those people signed away some rights in the YouTube T&Cs, wouldn't that mean Google control them? I assume Apple wouldn't be paying Google to access that data just to use it in their open source models.

I've always thoughts Apple should set up their own video sharing platform. Not to act as a serious competitor to YouTube, but just to give people the option of hosting videos somewhere that provides a better UX, especially on iPad.

"The Wired report detailed how companies including Apple, Anthropic, and NVIDIA had used the "YouTube Subtitles" dataset for AI model training. This dataset is part of a larger collection known as "The Pile," which is compiled by the non-profit organization EleutherAI."

Apple and others don't access the data directly, they just get the data from a "processor/compiler".

Mother Nature · Jul 18, 2024

sunny5 said:
It's very clear that you dont know how AI works. Without training AI, it wont work even if it's only for your devices. How do they even get mass datas to train with?

It's very clear that you don't even know how AI works. "AI" is used incorrectly to refer to Machine Learning (Apple is the only company that still talks about Machine Learning), there are also other "AI" technologies, have you ever heard of deep learning, diffusion models, supervised learning, weak AI, etc (I'm mixing types with technologies).

So, if you don't know that, please do not state that someone else doesn't know. Just learn that AI is more than a single thing.

PeterKeller · Jul 18, 2024

Sure. And no one listens to our Siri requests.

Somehow I would check the wording *very* carefully to see exactly what it is they denied.

bozzykid · Jul 18, 2024

Mousse said:
Isn't YouTube subtitles AI generated? I have to turn CC on because of the gawd awful muzak🤮 these content creators keep piping in the background.😒 It feels like I have aphasia trying to understand the YouTube captioning.😑

The bigger creators generally generate their own subtitles (or have a company do it for them) and don't rely on the auto generated ones since they are usually garbage.

DeepIn2U · Jul 18, 2024

sniffies said:
Thank god for that. Training on YouTube videos from popular content creators would render Apple Intelligence pretty unintelligent.

Fully agree.

Content creators - mostly don't know their foot from theie arse and talk eloquently to GenZ viewers convincing they know what they're talking about. The crao and BS I find within is astronomical false information gained from failed reading comorehension gets spoken verbatim so poorly you get garbage.

1 big example:
Google Messages will replace Samsung Messages.
This is False!
But if you even ask Google Gemini it too will feed you the false narrative. You have to ask twice or more (including ChatGPT) to cite the source and BITH will point to a dead link from AndroidPolice or some other Android site where an editor claimed this from Google and Samsung having a deal to create a messaging platform to push RCS as a standard. And though the partnership, Google paid Samsung handsomely to have Google Mesages as the default messaging app on their Galaxy phones.
Fact/Truth: when you read the 2018 Press Release on Google or Samsung's site you see absolutely no mention of the false narrative spoken verbatim everywhere else. All it takes is omeone influential or has a medium where a lot of impressionable people follow to create a mindless mob that expands from BS!

Now Ai LLMs have crawled the internet and became trolls themselves.

mazz0 · Jul 18, 2024

Mother Nature said:
"The Wired report detailed how companies including Apple, Anthropic, and NVIDIA had used the "YouTube Subtitles" dataset for AI model training. This dataset is part of a larger collection known as "The Pile," which is compiled by the non-profit organization EleutherAI."

Apple and others don't access the data directly, they just get the data from a "processor/compiler".

That doesn’t really answer the question though - how does EleutherAI get the rights to redistribute those works? Presumably either the creators or Google own those rights, and I can’t imagine either of them are happy to give it all away for free.

sunny5 · Jul 18, 2024

Mother Nature said:
It's very clear that you don't even know how AI works. "AI" is used incorrectly to refer to Machine Learning (Apple is the only company that still talks about Machine Learning), there are also other "AI" technologies, have you ever heard of deep learning, diffusion models, supervised learning, weak AI, etc (I'm mixing types with technologies).

So, if you don't know that, please do not state that someone else doesn't know. Just learn that AI is more than a single thing.

What you mentioned is very clear about how AI works: without data, no AI. You just proven my point after all.

CarAnalogy · Jul 18, 2024

foobarbaz said:
I think you're humanizing the AI too much. It's not a person searching knowledge "in the wild". It is a large file that has been created by a training algorithm which is given a lot of crawled data as the input. It doesn't learn anything outside of what its creators are passing along. And crucially, once training is complete, it's no longer acquiring knowledge. (Every interaction you have with it starts with a blank slate or explicit "context" given from your previous sessions/personal data.)

So the model's creators know absolutely what has been used to train it. They're generally just cagey about it, because they don't want to be sued once they admit whose copyrighted content they've used.

This is what gives me the slightest bit of hope for Apple Intelligence (still cringe typing that name.)

It’s not the core technology that’s necessarily the problem. They can apply it to a better data set and make it into a very good Siri. That’s all I want.

Please just let them separate the generative bit. Those terrible generated pictures they showed did not inspire confidence. The text tools are fine. But they really need to focus here and that has not been their strength lately.

grobik · Jul 19, 2024

Thank goodness. Half of YouTube is ai generated content at this point anyway

thebart · Jul 20, 2024

foobarbaz said:
I think you're humanizing the AI too much. It's not a person searching knowledge "in the wild". It is a large file that has been created by a training algorithm which is given a lot of crawled data as the input. It doesn't learn anything outside of what its creators are passing along. And crucially, once training is complete, it's no longer acquiring knowledge. (Every interaction you have with it starts with a blank slate or explicit "context" given from your previous sessions/personal data.)

So the model's creators know absolutely what has been used to train it. They're generally just cagey about it, because they don't want to be sued once they admit whose copyrighted content they've used.

Yep. The only way they don't know where they got their training data from is they told their engineers go ahead and trawl whatever you think is useful but don't keep logs cause we don't wanna get sued. We'll pretend the data fell off a truck!

Now, it's true that when the model spits out a response they may not know where it got the various bits of info and phrasings from (citation needed!) but they know in total what they've crawled.

Search

Search

Apple Intelligence Not Trained on YouTube Content, Says Apple

Bogstandard

macrumors regular

AbSoluTc

macrumors 603

citysnaps

macrumors G5

anthogag

macrumors 68030

Mousse

macrumors 601

sunny5

Suspended

darkslide29

macrumors 68000

Col4bin

macrumors 68000

Mother Nature

macrumors regular

Mother Nature

macrumors regular

PeterKeller

macrumors member

bozzykid

macrumors 68030

DeepIn2U

macrumors G5

mazz0

macrumors 68040

sunny5

Suspended

CarAnalogy

macrumors 603

grobik

macrumors regular

thebart

macrumors 6502a

Our Staff