The introduction of the DeepSeek R1 model has sent shockwaves through the AI industry, challenging established norms and redefining the economics of AI development. This model has demonstrated that we can train AI models more cost-effectively and in an environmentally friendly manner, without sacrificing performance. By leveraging innovative techniques, DeepSeek has shown that it's possible to achieve remarkable results without the exorbitant costs and environmental impact typically associated with AI training.
I think this is good news for all of us. As a big believer in the advantages that LLMs has to offer, I always feel somewhat uncomfortable knowing the environmental impact that these models have both during training and execution. DeepSeek has shown us that a different path is possible, providing a better balance between productivity and (environmental) cost. My hope is that other AI players will now re-evaluate on how to move forward and start applying the same techniques as well to build the next generation of AI models.
What makes DeepSeek different?
DeepSeek claims to have trained its models for significantly less cost compared to other industry giants. The R1 model was reportedly trained for under $6 million, whereas OpenAI's GPT-4 cost around $100 million. But how did they achieve this?
The DeepSeek team has been quite open on how they achieved these innovations and explains there way of working in a research paper. Some key elements that they used to make the model cost efficient and performant:
Mixture of Experts: They employed a specialized modular structure where different components of the model became "experts" in specific tasks. By activating only relevant modules during training, they significantly reduced computational demands and sped up the learning process.
Resource optimizations: Rather than relying on large supercomputers, DeepSeek distributed the training workload across multiple smaller computing nodes. This approach maintained model performance while proving more cost-effective.
Data selection: DeepSeek achieved data efficiency by combining carefully selected real-world data with artificially generated datasets. This strategic mix provided targeted, high-quality training material while avoiding the storage and processing demands of larger, more general datasets.
Algorithm Optimizations: The company implemented sophisticated optimization methods, including dynamic learning rate adjustment and compressed gradients. These techniques helped achieve faster training convergence while minimizing energy use.
I would recommend checking out this article by Thoughtworks as it explains quite well the approach the DeepSeek team took in more detail.
Is DeepSeek safe?
Probably you’ve already read somewhere that the hosted API version of DeepSeek's R1 model uses censorship mechanisms for topics considered politically sensitive by the Chinese government. Additionally, there are fears that the AI system could be used for foreign influence operations, spreading disinformation, surveillance, and the development of cyberweapons for the Chinese government.
There are also claims that DeepSeek used OpenAI’s API to integrate OpenAI’s AI models into DeepSeek’s own models.
So, I cannot tell you if it is a good idea to use DeepSeek but that should not stop you from using the model locally.
Using DeepSeek R1 locally using Ollama
Ignoring all the controversy, DeepSeek R1 remains a powerful and efficient open-source large language model that offers state-of-the-art reasoning, problem-solving, and coding abilities. By running it locally you don’t have to worry about safety and privacy.
DeepSeek R1 model is available in multiple parameter sizes on Hugging Face, Azure AI Foundry and also ollama.
So let’s give it a try:
ollama pull deepseek-r1
Now we can try the model locally. I’m using OpenWebUI(more about OpenWebUI in this post):
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda
Now we can select the DeepSeek R1 model in OpenWebUI:
Let's give it a try:
After some time (as DeepSeek is a reasoning model it can take a while before you get a result), we get a response:
DeepSeek R1 is a reasoning model, so it can be interesting to see the reasoning that happens when you ask it to execute a specific task. Therefore, expand the thinking section to see the reasoning the model applies:
You can watch in detail how it comes up with the final response:
Nice!
Remark: I tried a few prompts and noticed that the model sometimes got stuck in its own reasoning.
More information
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning