Summary:
DeepSeek V3 outperforms models from Meta and OpenAI.
Trained with 671 billion parameters at a cost of US$5.58 million.
Achieved results with fewer computing resources than competitors.
Andrej Karpathy praises DeepSeek's efficiency on social media.
Highlights the progress of Chinese AI firms despite US sanctions.
DeepSeek's Groundbreaking AI Model
Chinese start-up DeepSeek has made headlines with its latest release, the DeepSeek V3—a large language model (LLM) that is outperforming industry giants like Meta Platforms and OpenAI.
This impressive model boasts 671 billion parameters and was trained over a period of just two months at a surprisingly low cost of US$5.58 million. What sets it apart is its ability to achieve these results with significantly fewer computing resources than its competitors, showcasing an innovative approach in the AI sector.
What is an LLM?
LLMs are the backbone of generative AI services, such as ChatGPT. A higher number of parameters enables these models to adapt to complex data patterns, ultimately leading to more accurate predictions.
Industry Reaction
Computer scientist Andrej Karpathy, a founding member of OpenAI, acknowledged the achievement in a post on social media platform X, stating, “DeepSeek making it look easy … with an open weights release of a frontier-grade LLM trained on a joke of a budget.”
Open Weights Explained
The term open weights refers to the release of only the pretrained parameters of an AI model, allowing third parties to utilize the model for inference and fine-tuning, while keeping training code and datasets private.
The chatbot icons of DeepSeek and OpenAI’s ChatGPT displayed on a smartphone screen. Photo: Shutterstock
The Bigger Picture
DeepSeek’s achievement highlights the remarkable progress of Chinese AI firms. Despite facing US sanctions that restrict access to advanced semiconductors, they continue to innovate and challenge established players in the market.
Comments