DeepSeek’s New AI Model: A Powerful Open Challenger

December 28, 2024

DeepSeek V3: A Game Changer in AI

A Chinese lab has unveiled what may be one of the most formidable “open” AI models to date. Developed by DeepSeek, the DeepSeek V3 was released under a permissive license, allowing developers to download and modify it for a variety of applications, including commercial use.

Capabilities of DeepSeek V3

DeepSeek V3 is designed to handle an array of text-based tasks, including:

Coding
Translating
Writing essays and emails from descriptive prompts

According to DeepSeek’s internal benchmarks, V3 outperforms both downloadable models and closed AI models available only via API.

Benchmark Performance

In coding competitions on Codeforces, DeepSeek V3 has shown superior performance compared to:

Meta’s Llama 3.1 405B
OpenAI’s GPT-4o
Alibaba’s Qwen 2.5 72B

Additionally, it excels on Aider Polyglot, which tests the model’s ability to generate code that integrates with existing code.

Technical Specifications

Feature	Specification
Processing Speed	60 tokens/second (3x faster than V2)
API Compatibility	Intact
Model Type	Fully open-source
Parameters	671B MoE parameters, 37B activated parameters
Training Data	Trained on 14.8T high-quality tokens

Compelling Achievements

DeepSeek claims that V3 was trained using a dataset comprising 14.8 trillion tokens. For context, 1 million tokens equal approximately 750,000 words.

DeepSeek V3 is notably large, with 671 billion parameters (685 billion on AI development platform Hugging Face), significantly surpassing Llama 3.1 405B’s 405 billion parameters.

Despite the achievement, DeepSeek’s training approach was seemingly economical. The model was trained using a data center equipped with Nvidia H800 GPUs over about two months, and the company spent around $5.5 million on development, a fraction compared to the costs associated with models like GPT-4.

Limitations and Regulatory Context

However, the model’s political stances are notably constrained. For instance, it refrains from responding to inquiries about sensitive topics like Tiananmen Square. As a Chinese company, DeepSeek is under scrutiny by China’s internet regulators to ensure that its models reflect “core socialist values.” This regulatory environment often results in models declining to address contentious subjects.

DeepSeek’s Future

DeepSeek recently introduced DeepSeek-R1, a response to OpenAI’s o1 reasoning model. The organization, backed by High-Flyer Capital Management, aims to develop superintelligent AI.

In an interview, High-Flyer’s founder Liang Wenfeng noted that the closed-source nature of some AI models like OpenAI’s is a temporary barrier, asserting that it hasn’t hindered others from advancing.

Also read:
OpenAI ‘Considered’ Building a Humanoid Robot: Report

byDeepak Tiwari (Ex-CEO)

Published December 28, 2024