Key takeaways
Chinese artificial intelligence developer DeepSeek has revealed that training its breakthrough R1 reasoning model cost just $294,000, a fraction of what U.S. rivals reportedly spend on similar AI systems.
The disclosure, published Wednesday in the peer-reviewed journal Nature, marks the first time the Hangzhou-based company has provided official training cost estimates for its flagship model.
The rare update from the Hangzhou-based company – the first estimate it has released of R1's training costs – appeared in a peer-reviewed article in the academic journal Nature published on Wednesday. This revelation is likely to reignite debate over China's position in the global artificial intelligence race, particularly given the stark cost differential compared to Western competitors.
Sam Altman, CEO of U.S. AI giant OpenAI, said in 2023 that what he called "foundational model training" had cost "much more" than US$100 million, though his company has not given detailed figures for any of its releases. This suggests DeepSeek's approach may be orders of magnitude more cost-effective than traditional training methods employed by American AI companies.
Revolutionary training approach
The Nature paper, which lists DeepSeek founder Liang Wenfeng as a co-author, reveals that R1 was trained using a novel "pure reinforcement learning" approach.
The reason it was able to spend less, it seems, is because of the team's use of trial-and-error-based reinforcement learning techniques. Unlike traditional models that require human-annotated data and demonstrations, DeepSeek's system learned through rewards and penalties based on answer accuracy.
DeepSeek's major innovation was to use an automated kind of the trial-and-error approach known as pure reinforcement learning to create R1.
The process rewarded the model for reaching correct answers, rather than teaching it to follow human-selected reasoning examples. This methodology allowed the model to develop its own problem-solving strategies, including self-verification and reflection capabilities.
Carnegie Mellon University assistant professor Daphne Ippolito and PhD student Yiming Zhang explain the reinforcement method by comparing it to a child playing a video game: "As the child navigates their avatar through the game world, they learn through trial and error that some actions (such as collecting gold coins) earn points, whereas others (such as running into enemies) set their score back to zero. In a similar vein, DeepSeek-R1 was awarded a high score when it answered questions correctly and a low score when it gave wrong answers."
Hardware and technical details
The Nature article, which listed Liang as one of the co-authors, said DeepSeek's reasoning-focused R1 model cost US$294,000 to train and used 512 Nvidia H800 chips.
The H800 chips were specifically designed by Nvidia for the Chinese market after the U.S. implemented export controls in October 2022, prohibiting the sale of more powerful H100 and A100 processors to China.
However, DeepSeek acknowledged in supplementary documentation that it does own A100 chips, which were used during early development phases. In a supplementary information document accompanying the Nature article, the company acknowledged for the first time it does own A100 chips and said it had used them in preparatory stages of development. "Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to prepare for the experiments with a smaller model," the researchers wrote.
After this initial phase, R1 was trained for a total of 80 hours on the 512 chip-cluster of H800 chips, they added.
Addressing distillation allegations
The Nature publication directly addresses longstanding allegations that DeepSeek may have copied outputs from competing AI systems, particularly OpenAI's models. Soon after R1's release, speculation swirled that DeepSeek had leaned on rival outputs, particularly from OpenAI, to accelerate training; however, the company has now flatly denied that charge.
In correspondence with referees, DeepSeek insisted that R1 did not copy reasoning examples generated by OpenAI. The company emphasized that its breakthrough came from developing efficient reinforcement learning techniques rather than learning from competitor outputs.
However, DeepSeek did acknowledge that some training data for its base model may have inadvertently included web content containing AI-generated responses. DeepSeek's rare update offers new insight into its technology and cost-efficient methods, reinforcing China's growing presence in AI innovation.
Market impact and expert views
When DeepSeek initially released its lower-cost AI systems in January 2025, the announcement sent shockwaves through global financial markets. DeepSeek's release of what it said were lower-cost AI systems in January prompted global investors to dump tech stocks as they worried the new models could threaten the dominance of AI leaders including Nvidia.
Lewis Tunstall, a machine-learning engineer at Hugging Face who reviewed the Nature paper, welcomed the peer review process. "This is a very welcome precedent," says Lewis Tunstall, a machine-learning engineer at Hugging Face who reviewed the Nature paper.
The model's influence on the AI research community has been substantial. The model has been "quite influential" among AI researchers, says Sun. "Almost all work in 2025 so far that conducts reinforcement learning in LLMs might have been inspired by R1 one way or another."
Regulatory and geopolitical context
U.S. officials have previously questioned some of DeepSeek's statements about its development costs and technology usage. U.S. officials told Reuters in June that DeepSeek has access to "large volumes" of H100 chips that were procured after U.S. export controls were implemented. Nvidia told Reuters at the time that DeepSeek has used lawfully acquired H800 chips, not H100s.
Reuters has previously reported that one reason DeepSeek was able to attract the brightest minds in China was because it was one of the few domestic companies to operate an A100 supercomputing cluster.
R1 has demonstrated exceptional performance across various benchmarks, particularly in mathematics and coding tasks. R1 is designed to excel at 'reasoning' tasks such as mathematics and coding, and is a cheaper rival to tools developed by US technology firms. As an 'open weight' model, it is available for anyone to download, and it is the most popular such model on the AI community platform Hugging Face to date, having been downloaded 10.9 million times.
In a challenge to complete scientific tasks such as analyzing and visualizing data, known as ScienceAgentBench, Sun and colleagues found that although R1 was not first for accuracy, it was one of the best models in terms of balancing ability with cost.
Historic peer review publication
The publication in Nature represents a significant milestone for the AI industry. R1 is thought to be the first major LLM to undergo the peer-review process. This transparency sets a new standard for AI model documentation and verification.
"Peer reviews relying on independent researchers is a way to dial back hype in the AI industry. Claims that cannot be verified are a real risk for society, given how ubiquitous this technology has become."
The peer review process prompted DeepSeek to release unprecedented details about its training methodology, data sources, and safety measures, providing valuable insights for the broader AI research community.
Since its initial release, R1 has maintained its position as the most downloaded open-source reasoning model, with continued strong adoption among researchers and developers worldwide seeking cost-effective alternatives to proprietary AI systems.
Read more