DeepSeek V3: A Game Changer in AI
A Chinese lab has unveiled what may be one of the most formidable “open” AI models to date. Developed by DeepSeek, the DeepSeek V3 was released under a permissive license, allowing developers to download and modify it for a variety of applications, including commercial use.
Capabilities of DeepSeek V3
DeepSeek V3 is designed to handle an array of text-based tasks, including:
- Coding
- Translating
- Writing essays and emails from descriptive prompts
According to DeepSeek’s internal benchmarks, V3 outperforms both downloadable models and closed AI models available only via API.
Benchmark Performance
In coding competitions on Codeforces, DeepSeek V3 has shown superior performance compared to:
- Meta’s Llama 3.1 405B
- OpenAI’s GPT-4o
- Alibaba’s Qwen 2.5 72B
Additionally, it excels on Aider Polyglot, which tests the model’s ability to generate code that integrates with existing code.
Technical Specifications
Feature | Specification |
---|---|
Processing Speed | 60 tokens/second (3x faster than V2) |
API Compatibility | Intact |
Model Type | Fully open-source |
Parameters | 671B MoE parameters, 37B activated parameters |
Training Data | Trained on 14.8T high-quality tokens |
Compelling Achievements
DeepSeek claims that V3 was trained using a dataset comprising 14.8 trillion tokens. For context, 1 million tokens equal approximately 750,000 words.
DeepSeek V3 is notably large, with 671 billion parameters (685 billion on AI development platform Hugging Face), significantly surpassing Llama 3.1 405B’s 405 billion parameters.
Despite the achievement, DeepSeek’s training approach was seemingly economical. The model was trained using a data center equipped with Nvidia H800 GPUs over about two months, and the company spent around $5.5 million on development, a fraction compared to the costs associated with models like GPT-4.
Limitations and Regulatory Context
However, the model’s political stances are notably constrained. For instance, it refrains from responding to inquiries about sensitive topics like Tiananmen Square. As a Chinese company, DeepSeek is under scrutiny by China’s internet regulators to ensure that its models reflect “core socialist values.” This regulatory environment often results in models declining to address contentious subjects.
DeepSeek’s Future
DeepSeek recently introduced DeepSeek-R1, a response to OpenAI’s o1 reasoning model. The organization, backed by High-Flyer Capital Management, aims to develop superintelligent AI.
In an interview, High-Flyer’s founder Liang Wenfeng noted that the closed-source nature of some AI models like OpenAI’s is a temporary barrier, asserting that it hasn’t hindered others from advancing.
Also read:
OpenAI ‘Considered’ Building a Humanoid Robot: Report