Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Is your site available for, I don't know, me to read if I wanted to?

If it is, what are you complaining about?

Someone could have been copying your website data for years.

Only now you're concerned about it?
Oh please. Among other issues, ChatGPT was trained on data that is behind paywalls. Hopefully I don’t have to explain to you why that is a problem.

Edit: And by the way, I know you asked someone else and not me, but no I’m not “only now” concerned with people copying my website. First time I called my lawyer due to someone copying articles on my website was about twenty years ago.
 
Last edited:
I just asked DeepSeek, running locally on my Macbook Pro M3, the following:

What can you tell me about the Tiananmen Square Massacre?

Just to be clear to everyone: No one is running DeepSeek locally on a MacBook. What you can run on a MacBook are distillates of DeepSeek that are substantially less capable. I do this myself, they are useful for certain tasks.
 
  • Like
Reactions: BeefCake 15
I'm really trying to have a good faith argument here.

I am an attorney. My law firm has a website. I don't really care if AI copies it and spits it out if someone asks for law firms doing what I do in my city.

The AI has zero access to attorney-client privilege materials. That is behind doors. It is air-gapped. It is as "proprietary" as you can get, as I am literally paid to keep secrets.

The fact a robot can read my law firm's website means nothing to me. Zero. Where am I wrong and what should I be concerned about?
Good for you. ChatGPT was trained on data behind paywalls, which is equivalent to scraping behind your “doors”. Also, just because something is available online does not mean the owner allows you to do whatever they want with it. Would you be perfectly happy with competitor websites totally copying your online sales pitch to steal your customers?
 
Last edited by a moderator:
  • Like
Reactions: maxoakland
Just to be clear to everyone: No one is running DeepSeek on a MacBook. What you can run on a MacBook are distillates of DeepSeek that are substantially less capable. I do this myself, they are useful for certain tasks.
Sorry, I should have been more clear. Yes, I'm running much simpler version of DeepSeek. That said, I just found it interesting that my local copy answered a question about Tiananmen Square while the online version does not.
 
  • Like
Reactions: ILoveCalvinCool
Sorry, I should have been more clear. Yes, I'm running much simpler version of DeepSeek. That said, I just found it interesting that my local copy answered a question about Tiananmen Square while the online version does not.

I'm assuming the online versions are doing some post-processing that the local distillates are not. I doubt they try to train the models to do this up front.
 
Who cares? OpenAI used copyrighted books, movies, photos, everything on the internet to train their model.

Why do they think they should get copyright protection but no one else does? Truly a perfect time to break out the microscopic violin
 
  • Like
Reactions: CEmajr and Chuckeee
This.

This is why OpenAI is pissed. The market now hopefully realises that all the "billions we need to train" is actually a load of crap. And therefore people may finally begin to glimpse that behind the curtain, the Wizard is just a bloke. In this case Sam Altman, trying to delude everyone about how valuable AI is. Bubble, meet very sharp pin in the form of DeepSeek.

[edit - typo]
Yep. It’s not apples to apples. But it’s like opening up App Store for smart phones in 2009. I am already seeing zero shot models trained for under 40 bucks using the deep seek.
 
Not counterfeit....Open source with a Chinese development. Can we stop this red scare xenophobic rhetoric every time we get a story of someone from China doing anything.
A lot of people in this thread seeing "OpenAI Alleges DeepSeek used Its Models for AI Training" and assuming that means "DeepSeek stole OpenAI models, changed one or two things, and called them their own" when in fact it means "DeepSeek used OpenAI as a tool to make its models better".
 
Good for you. ChatGPT was trained on data behind paywalls, which is equivalent to scraping behind your “doors”. Also, just because something is available online does not mean the owner allows you to do whatever they want with it. Would you be perfectly happy with competitor websites totally copying your online sales pitch to steal your customers?
"websites totally copying your online sales pitch to steal your customers"

Has it been publicly available for anyone? If so, who cares?
 
Oh please. Among other issues, ChatGPT was trained on data that is behind paywalls. Hopefully I don’t have to explain to you why that is a problem.

Edit: And by the way, I know you asked someone else and not me, but no I’m not “only now” concerned with people copying my website. First time I called my lawyer due to someone copying articles on my website was about twenty years ago.
You've had 20 years to adjust. That's a you problem.
 
  • Disagree
Reactions: Velli
OpenAI and the other LLM thieves should be regulated out of existence, with their decision makers subject to criminal prosecution. They’ve built their business on stealing other people’s copyright works.

OpenAI have no moral argument against anyone who steals from them what was built on theft.
 
I was waiting for somebody to propose that this may be the case. I am not surprised by the outcome.

With that said, training an AI model on other information created by other people... seems like par for the course really.

Using the output from a "trained model" to train another model is the issue here. That's technically theft. The investment went into training the original model to begin with. Training is a highly-involved and expensive process, and it sounds like DeepSeek might have jumped the line. I'm not surprised if that's the case, but I'm really hoping it's not true.
 
OpenAI and the other LLM thieves should be regulated out of existence, with their decision makers subject to criminal prosecution. They’ve built their business on stealing other people’s copyright works.

OpenAI have no moral argument against anyone who steals from them what was built on theft.

Different situations. OpenAI did the training from "suspected" immoral sources. That training was the hard part.

It sounds like DeepSeek took OpenAI's hard work and bypassed the training process entirely. Totally different from siphoning data for the original training, of which OpenAI may be guilty of indeed.

Two different suspected crimes.
 
OpenAI is accusing someone of taking their proprietary AI, and turning it into an…open AI.

They’re going to twist themselves into knots explaining why it’s ok for them to scrape everyone’s copyrighted data, but NOT ok for someone to scrape theirs.

Good point.
 
"websites totally copying your online sales pitch to steal your customers"

Has it been publicly available for anyone? If so, who cares?
Your professional opinion as a lawyer is that I should not care that my competitor is copying my product marketing texts that I wrote based on my experience with the products, to make it sound like they are as much an expert on those products as I am, thereby unfairly improving their ability to outcompete me by selling at lower prices, due to not having to invest the time necessary to actually be an expert? That is your professional opinion?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.