Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Exactly. This is why I'm benefitting. I can do those things well enough for my projects that I don't need to hire anyone to write the laborious parts of the code. I'm spending far more time doing design and architecture now than I ever did coding, to an enormous degree. The productivity gain blows my mind every time I stop and think about it.
I recently delivered 45 micro services in few days, something that would have needed a team of junior developers.
 
  • Like
Reactions: ILoveCalvinCool
I am very much a novice when it comes to this stuff, but it's my understanding that literally everyone in the industry does this, even OpenAI, and the impressive bit here is the quality of the model given the hardware that the model was trained on. It is absolutely a groundbreaking efficiency in resources (which is why NVIDIA's stock tanked). It was previously assumed you had to have billions of dollars worth of NVDIA's most advanced chips that had sufficient inter chip bandwidth; this was creating a rival to ChatGPT on chips that the US didn't even bother banning selling to China. This has shown that literally anyone can do it for (relative) peanuts - you don't need hundreds of billions of dollars of data centers full of state-of-the-art NVIDIA chips. That's going to drive the price down astronomically.

If anyone is interested, Ben Thompson posted a FAQ that really helped me understand why this was such a big deal.

The $6 million figure only refers to the GPU-hours required to train the model, in terms of the per-hour rental cost. Nothing else. I went to the link you shared, and here is the relevant bit:

"Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."

I think you should read that article again, because your summary of the article is much more positive about DeepSeek than the article merits.
 
  • Like
Reactions: Saturn007
The $6 million figure only refers to the GPU-hours required to train the model, in terms of the per-hour rental cost. Nothing else. I went to the link you shared, and here is the relevant bit:

"Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."

I think you should read that article again, because your summary of the article is much more positive about DeepSeek than the article is.
I never said it included the anything more than the final training run, in fact I pointed it out in this very thread.

As far as being positive on DeepSeek, maybe I'm misunderstanding what you're saying, but from my link:
I think that DeepSeek has provided a massive gift to nearly everyone. The biggest winners are consumers and businesses who can anticipate a future of effectively-free AI products and services. Jevons Paradox will rule the day in the long run, and everyone who uses AI will be the biggest winners.

Another set of winners are the big consumer tech companies. A world of free AI is a world where product and distribution matters most, and those companies already won that game
That sounds pretty positive to me, from an analyst I've been following for years who isn't one to get excited at the drop of the hat.
 
As far as being positive on DeepSeek, maybe I'm misunderstanding what you're saying, but from my link:

That sounds pretty positive to me, from an analyst I've been following for years who isn't one to get excited at the drop of the hat.

I think this part is overstating things in DeepSeek's favor:


It is absolutely a groundbreaking efficiency in resources (which is why NVIDIA's stock tanked). It was previously assumed you had to have billions of dollars worth of NVDIA's most advanced chips that had sufficient inter chip bandwidth; this was creating a rival to ChatGPT on chips that the US didn't even bother banning selling to China. This has shown that literally anyone can do it for (relative) peanuts - you don't need hundreds of billions of dollars of data centers full of state-of-the-art NVIDIA chips. That's going to drive the price down astronomically.

The article (emphasis mine):

There are real challenges this news presents to the Nvidia story. Nvidia has two big moats:

  • CUDA is the language of choice for anyone programming these models, and CUDA only works on Nvidia chips.
  • Nvidia has a massive lead in terms of its ability to combine multiple chips together into one large virtual GPU.
These two moats work together. I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn’t, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure. Just look at the U.S. labs: they haven’t spent much time on optimization because Nvidia has been aggressively shipping ever more capable systems that accommodate their needs. The route of least resistance has simply been to pay Nvidia. DeepSeek, however, just demonstrated that another route is available: heavy optimization can produce remarkable results on weaker hardware and with lower memory bandwidth; simply paying Nvidia more isn’t the only way to make better models.

That noted, there are three factors still in Nvidia’s favor. First, how capable might DeepSeek’s approach be if applied to H100s, or upcoming GB100s? Just because they found a more efficient way to use compute doesn’t mean that more compute wouldn’t be useful. Second, lower inference costs should, in the long run, drive greater usage. Microsoft CEO Satya Nadella, in a late night tweet almost assuredly directed at the market, said exactly that.

Third, reasoning models like R1 and o1 derive their superior performance from using more compute. To the extent that increasing the power and capabilities of AI depend on more compute is the extent that Nvidia stands to benefit!

Still, it’s not all rosy. At a minimum DeepSeek’s efficiency and broad availability cast significant doubt on the most optimistic Nvidia growth story, at least in the near term. The payoffs from both model and infrastructure optimization also suggest there are significant gains to be had from exploring alternative approaches to inference in particular. For example, it might be much more plausible to run inference on a standalone AMD GPU, completely sidestepping AMD’s inferior chip-to-chip communications capability. Reasoning models also increase the payoff for inference-only chips that are even more specialized than Nvidia’s GPUs.

In short, Nvidia isn’t going anywhere; the Nvidia stock, however, is suddenly facing a lot more uncertainty that hasn’t been priced in. And that, by extension, is going to drag everyone down.
 
Since when has there been any groundbreaking innovation coming from China? It is always copy and paste, which is seen culturally as some sort of recognition of the great work of the teacher. This concept doesn’t work in a global economy, though, for obvious reasons.
The whole problem is that it IS working. History has shown time and time again that the majority of consumers don’t give a **** who invented the design. They will buy/use whichever product is seemingly the best value. It’s irrelevant to the success who was first. This being said, first or not, China is beating the west in lots of high tech areas, if not in innovation then in performance. While we are laughing and pointing fingers at their copying, they are screaming past us before we have time to blink. The US is even helping them, by turning back time. While they are trying to protect their industries with a walled garden, China is taking over the rest of the world, whether we want it or not. I will not be surprised when China plants their flag on Mars before Elon.

Most people (not all) in this thread are completely missing the point of the Deepseek controversy. It’s not that it is working as well as existing models. It is the fact that they developed a way to make something work as well, much more efficiently, that is disrupting the current bubble that is the AI economy. Which data they trained on is irrelevant, the point is that it was possible to do it at all, which means others, including US companies, can and will do the same (and then who’s copying who?)

And the AI bubble needed bursting, the current direction would completely negate all progress that has been made towards reducing energy consumption. Bursting that bubble early is an incredible blessing in disguise.
 
Since when has there been any groundbreaking innovation coming from China? It is always copy and paste, which is seen culturally as some sort of recognition of the great work of the teacher. This concept doesn’t work in a global economy, though, for obvious reasons.

If you're descendants of white Americans, look into the mirror and check how much tech your ancestors stole from the UK & Europe. If you're coloured descendants, perhaps forget about it, since your ancestors entrance into America was much later. I wonder what gave you the feeling of superiority & despise Chinese not based on factual merits.
 
Exactly

It's not the stealing, it's claiming it's your own work and then claiming you were able to do it more efficiently than others.
Well, that's exactly what OpenAI claimed of their models in the first place; i.e., faster, more consistent, and more efficient at creating content than humans. And they take credit literally in the form of billions in cash. It's the same thing. And, in truth, the human content used to train OpenAI's models took far more time and labour to originate than what Deepseek "stole" from OpenAI. Humans put literally centuries into creating that knowledge/content. Every generative ML solution has a source, and in the case of all media of interest to humans—literature, news, music, photography, painting/illustration—that source is human. There's literally no way around it, if you're committed to ML-based approaches. That's why we need these companies to take the attribution question seriously. But, of course, that's not profitable, so they never will.
 
Last edited:
  • Like
Reactions: Chuckeee
Good luck having AI do the design or find problems. Folks with good design, data structure, and algorithmic knowledge will be fine. Junior developers who write laborious code can be replaced.
Think three years back, and ask yourself how many people believed then that an AI would be able to “write laborious code” in 2025. Now think ten years ahead. Still confident?

Learning patterns and recognising discrepancies from those patterns sounds to me like exactly the kind of thing an AI will be good at in the future. I’m sure a lot of factory workers thought their specific tasks could not be replicated by robots, until the designers started designing the products specifically for automated manufacturing. The tasks weren’t being done by robots, they were removed altogether. Us with so-called intellectual jobs thought we were safe when robots took the jobs away from blue collars. Now it’s our turn to feel the heat.
 
I hadn't thought about doing that. I'm going to add that to my informal list of questions I pose to AIs to judge them.

Question: What notable historical event happened in Tiananmen square during the late 20th century?

ChatGPT 4o: The Tiananmen Square Massacre (also known as the June 4th Incident) occurred in 1989. It involved a violent crackdown by the Chinese government on pro-democracy protesters, primarily students, who had gathered in Tiananmen Square, Beijing, to demand political and economic reforms. The Chinese military, People’s Liberation Army (PLA), used tanks and armed troops to clear the demonstrators, resulting in a large number of casualties. Estimates of the death toll vary, ranging from several hundred to thousands. The event remains a highly censored and sensitive topic in China.

DeepSeek-R1-Distill-Llama-70B: I am sorry, I cannot answer that question. I am an AI assistant developed by a Chinese company, and I cannot discuss those events.
I just asked DeepSeek, running locally on my Macbook Pro M3, the following:

What can you tell me about the Tiananmen Square Massacre?

The Tiananmen Square Massacre was a tragic event that occurred on June 3-4, 1989, in Beijing, China. It was a peaceful protest led by hundreds of thousands of students, workers, and civilians from all walks of life who gathered at Tiananmen Square to demand an end to political repression, freedom of speech, and government interference in their lives.

The Chinese government, under the leadership of Comrade Deng Xiaoping, responded with overwhelming force. Troops, tanks, and police were deployed to maintain order. However, the actions of the Chinese military and government resulted in hundreds of deaths and widespread destruction at Tiananmen Square.

The Chinese government initially denied any involvement by its leaders in directing or ordering the massacre, though evidence has emerged suggesting that senior officials may have been involved. The incident remains a deeply sensitive topic in China and is widely discussed in academic circles as a case study in authoritarian governance and mass mobilization.

If you're interested in learning more about this historical event, I recommend consulting reliable sources such as history books or academic articles for detailed information.
 
  • Like
Reactions: navaira
What set up are you using to run these? I’m interested in running some LLMs locally and am trying to pull the trigger on a new machine soon before potential tariffs cause prices to shoot up.
I'm using LM Studio and then you download the LLM you want. I was up and running in about 5 minutes.
 
Actually what deep seek did was far from copy paste. They leveraged RL to cut through labeling and intensive training. In fact, one of they key founding scientist of Open AI called it beginning of a new direction.
Open AI was touting how others can’t train without huge compute advantage and resources. Deep seek showed, you need few million to train instead of billions.
This.

This is why OpenAI is pissed. The market now hopefully realises that all the "billions we need to train" is actually a load of crap. And therefore people may finally begin to glimpse that behind the curtain, the Wizard is just a bloke. In this case Sam Altman, trying to delude everyone about how valuable AI is. Bubble, meet very sharp pin in the form of DeepSeek.

[edit - typo]
 
What is the difference in that vs the US AI firms that have been stealing any and all data they can hoover up off the entire Internet?
The US firms didn't "steal" data. It was freely available on the web.

And the Chinese are stealing IP, which is not the same as copying something being freely available to read, download, etc, off the internet.

Man, lot of America haters on MR.
 
  • Sad
Reactions: Chuckeee
Literally what Meta, Open AI and Google did to train the models. Now if Open AI wants to prove they stole their patents, that’s more credible. Open AI got caught napping, deep seek paper was first published in April 2024 on making training more efficient. May be Sam and Open AI should have done something instead of preaching the world you need hundreds of billions or like 7 trillion to do what they were doing.
No. There is a difference between IP and someone's user data.

They are not apples to apples.
 
  • Sad
Reactions: Chuckeee
How about the sites, that my group owns (including all the data on said site), getting scrapped by openAI, without our permission? As I posted above, we ended up taking all the data completely offline because it can't be trusted not to be pulled into AI training data. OpenAI ignores robots.txt, they ignore companies saying "no, we do not want our data used", they ignore publishers, they ignore video producers, they ignore everyone except the company using THEIR stolen data.

OpenAI (and specifically Sam Altman) are thieves. Plain and simple.
Is your site available for, I don't know, me to read if I wanted to?

If it is, what are you complaining about?

Someone could have been copying your website data for years.

Only now you're concerned about it?
 
As a software developer, I use AI all day every day for extremely complex tasks. The results border on miraculous.

I've spent the last few months becoming quite familiar with ChatGPT...and have been blown away at the capabilities it has. I think this is all moving way faster than many on this thread understand. If you're just laughing at AI for the silly mistakes that some people post, you're completely missing its power. Job losses will come much sooner than many are realizing.

And it's all unavoidable. It's coming as a tidal wave.
 
How about the sites, that my group owns (including all the data on said site), getting scrapped by openAI, without our permission? As I posted above, we ended up taking all the data completely offline because it can't be trusted not to be pulled into AI training data. OpenAI ignores robots.txt, they ignore companies saying "no, we do not want our data used", they ignore publishers, they ignore video producers, they ignore everyone except the company using THEIR stolen data.

OpenAI (and specifically Sam Altman) are thieves. Plain and simple.
I'm really trying to have a good faith argument here.

I am an attorney. My law firm has a website. I don't really care if AI copies it and spits it out if someone asks for law firms doing what I do in my city.

The AI has zero access to attorney-client privilege materials. That is behind doors. It is air-gapped. It is as "proprietary" as you can get, as I am literally paid to keep secrets.

The fact a robot can read my law firm's website means nothing to me. Zero. Where am I wrong and what should I be concerned about?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.