There are a lot of bad takes out there so it’s probably worth collecting some good ones.

DeepSeek-V3 was released 2024-12-26: https://api-docs.deepseek.com/news/news1226 DeepSeek-R1 was released 2025-01-20: https://api-docs.deepseek.com/news/news250120

NOTE: US President Donald Trump proposed up to 100% tariffs on Taiwanese imports (eg, all advanced semi-conductors) on Monday 1/27 and the entire tech-sector market drop that was being attributed to DeepSeek news was most likely insider-trading of this announcement:

I would strongly under-index any analysis that failed to take this into account. It’s worth noting that from a logical perspective, DeepSeek’s success should help drive additional GPU demand:

Technical

If you are trying to understand the latest DeepSeek models it’s probably best to start with the first-party papers, they are well written and relatively in-depth:

They have of course, published more:

For those looking for more on the DeepSeek infrastructure optimization, there is one paper on optimizing a 10,000 A100 GPU cluster that is not commonly cited:

For DL training, we deployed the Fire-Flyer 2 with 10,000 PCIe A100 GPUs, achieved performance approximating the DGX-A100 while reducing costs by half and energy consumption by 40%. We specifically engineered HFReduce to accelerate allreduce communication and implemented numerous measures to keep our Computation-Storage Integrated Network congestion-free. Through our software stack, including HaiScale, 3FS, and HAI-Platform, we achieved substantial scalability by overlapping computation and communication.

There’s been a lot of misunderstanding of how much training frontier base-models cost. For some good analysis on this, see:

Analysis

2024-11-27 Deepseek: The Quiet Giant Leading China’s AI Race https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas

In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat.

Bad Takes