Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
This most likely has to do with fact they trained on GPT2 model which was open sourced anyways. Most of the LLm models use GPT2 or modified GPT2 architecture. The data from open AI was used for distillation.
Nope, all GPTs including OpenAI are based on transformer architecture invented by Google. GPT2 is a 1.8 billion parameter model trained on 8 million web pages. It is laughably bad compared to what models are available now. I can run a 4 billion parameter model on my iPhone 13 pro a few years ago.

Deepseek V3 is a 671 billion parameter MoE model, they are not even in the same league. Their CoT model is a distillation from the 671 billion parameter model.
 
  • Like
Reactions: ILoveCalvinCool
Hey, OpenAI, so why didn’t you “steal” your own data to train and create a better AI first?

Smells like BS.

Why would DeepSeek copy from junk?
 
  • Like
Reactions: nnoble
Is there much point in debating about AI theft?

Industry wont respect IP.
China won't prosecute.
Seemingly, neither will the US.
Even if they did, with models made public - can't exactly wind back the clock.

It's a lawless, cut-throat industry but.. what are they gonna do, cry harder?
 
Hey, OpenAI, so why didn’t you “steal” your own data to train and create a better AI first?

Smells like BS.

Why would DeepSeek copy from junk?
I know you’re (presumably?) making a joke, but they actually do the same thing on their own models. Obviously it’s not against the terms of service when they do it, because it’s their model, but they definitely do it.
 
Since when has there been any groundbreaking innovation coming from China? It is always copy and paste, which is seen culturally as some sort of recognition of the great work of the teacher. This concept doesn’t work in a global economy, though, for obvious reasons.
A quick google search brings up paper, gunpowder, printing, the compass, alcohol, Go and Mahjong, playing cards…..
 
Interesting. Anyway it is good to see more AI models. Will be nice to see how it will all improve in the future.
 
  • Like
Reactions: mganu
It’s China so they’re going to steal it. That being said, it’s literally called OpenAI. And now they going “for profit” after virtue signaling that they wouldn’t. They can sit this one out, Sam Altman especially.
 
Content on internet is open for any one to use, if you don't like it then block it behind passwords and paywalls.
One, that is not correct. Just because I didn’t lock my house doesn’t give you the right to go in and take anything you want. There are terms of service. Two, OpenAI trained their model on data that was behind passwords and paywalls, so even that didn’t help.
 
Is your site available for, I don't know, me to read if I wanted to?

If it is, what are you complaining about?

Someone could have been copying your website data for years.

Only now you're concerned about it?

No, it's not, unless you know the address directly. There are/were no outside links, but crawlers were still finding them.

What I'm complaining about, is that we didn't give permission for it to be used. We didn't want it scraped by search engines, and don't want it added to AI training sets. Since these AI companies think our opinions matter less than their profit, we've completely removed from the internet.

I'm really trying to have a good faith argument here.

I am an attorney. My law firm has a website. I don't really care if AI copies it and spits it out if someone asks for law firms doing what I do in my city.

The AI has zero access to attorney-client privilege materials. That is behind doors. It is air-gapped. It is as "proprietary" as you can get, as I am literally paid to keep secrets.

The fact a robot can read my law firm's website means nothing to me. Zero. Where am I wrong and what should I be concerned about?

My wife is an attorney and she doesn't care either.

This was proprietary information, collected by us over 40+ years. We didn't want it index, scraped, used or copied outside of the caving community. The reasons aren't important, though they are many, but the point is that we should be able to choose for it to not be used, or not used, and said companies should respect that.
 
Last edited:
Oh cry me a river, does anyone have sympathy?

Samsung tears into iPhones, Apple does the same to Galaxy phones...

Heck- food scientists and culinary researchers reverse engineer recipes all the time just to recreate them.

That’s how the world moves forward and improves.

Jog on, Altman.
 
It’s China so they’re going to steal it. That being said, it’s literally called OpenAI. And now they going “for profit” after virtue signaling that they wouldn’t. They can sit this one out, Sam Altman especially.
OpenAI is just a name. There isn’t anything open in their offer.
People’s idea of “ethic” varies. I do not expect “ethic” from any global corporation at this point, least of all OpenAI.
Fair enough
 
By the way, for those that get riled up because Deepseek doesn’t want to answer critical questions about Chinese politics, here’s what I got from Google Gemini when asking it whether Trump was a good president:

“I can't help with responses on elections and political figures right now. I'm trained to be as accurate as possible but I can make mistakes sometimes. While I work on improving how I can discuss elections and politics, you can try Google Search.”

Same answer for Biden, Clinton and Bush. But, it was perfectly happy to give an elaborate answer on Reagan and JFK. So if you want to claim Deepseek is Chinese censorship, Google is American censorship.
 
By the way, for those that get riled up because Deepseek doesn’t want to answer critical questions about Chinese politics, here’s what I got from Google Gemini when asking it whether Trump was a good president:

“I can't help with responses on elections and political figures right now. I'm trained to be as accurate as possible but I can make mistakes sometimes. While I work on improving how I can discuss elections and politics, you can try Google Search.”

Same answer for Biden, Clinton and Bush. But, it was perfectly happy to give an elaborate answer on Reagan and JFK. So if you want to claim Deepseek is Chinese censorship, Google is American censorship.
I never really saw the point of asking an AI for an opinion. I can see asking for something like a background briefing (and even that can be biased) but not whether someone is good or bad. We can't let these things think for us.
 
Since when has there been any groundbreaking innovation coming from China? It is always copy and paste, which is seen culturally as some sort of recognition of the great work of the teacher. This concept doesn’t work in a global economy, though, for obvious reasons.
Saying this is crazy when OpenAI’s whole business model relies on plagiarism and training off of online content without consent.
 
I am probably too old to understand distillation now. I remembered good days in 90s.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.