«It feels like a major development for AI research.»
The launch of the Chinese AI chatbot Deepseek two weeks made headlines, shook the financial markets and the technology world. Reza Kakooee, AI researcher at the FHNW School of Computer Science and expert for reinforcement learning of AI models, explains the reasons for Deepseek’s success and what it means for the future of AI research. He calls for increased efforts by actors in Switzerland to catch up in AI.
Reza, Deepseek has created a lot of uproar. Major news reported, Deepseek’s website went down and stocks of companies that profit from the AI boom such as chip maker Nvidia plunged. What happened?
Deepseek is a chatbot launched by a Chinese company. It is based on a large language model released by the same company already in December 2024. Two weeks ago, they released their new reasoning model for the public along a scientific paper, in which the team explained their methodology.
Is the great excitement around Deepseek justified or is it a short hype?
It feels like a major development for AI research. There are several aspects about Deepseek that surprised the AI community, and also me. While DeepSeek's approach aligns with current AI research trends, they employed different techniques to train the model in a less resource-intensive manner, achieving performance comparable with its Western competitors.
What does Deepseek make differently from others?
What’s unique about it that it uses established development techniques in AI research but combined them in a more efficient way.
According to Deepseek’s paper, they managed to train their model with much less resources than their competitors. DeepSeek’s approach introduces a shift in the application of scaling laws, demonstrating that with new algorithms and optimized training, better performance can be achieved with less compute resources. Deepseek claims to have used only about 2,000 AI chips for training its reasoning model, although some claim they have around 50,000. This would still be much less expensive than the costs OpenAI incurred for its GPT-based models.
Deepseek offers its services at a significantly lower cost than its competitors. While OpenAI charges $60 per million output tokens for its O1 model, Deepseek's reasoning model costs only about $2; which is 30 times less. However, this is the current pricing, and OpenAI may lower its prices for future models. In general, the cost of AI, like any other technology, tends to decrease over time. OpenAI's trained their earlier models at higher price, but maybe this is the price of being first to innovate.
«AI models will be critical for future developments in business and daily life, and in Switzerland, we need to have our own AI models that are well-aligned with our cultural values.»
How did Deepseek achieve this high efficiency?
The big AI models by OpenAI or Anthropic are usually trained in three levels. After the model has been trained on large dataset and fine-tuned with good quality data; in the next step of ‘reinforcement learning’, humans rank the model’s responses, so the model learns what are good answers. But human labour is costly compared to computer power and require additional models which adds to the training costs.
Deepseek had a surprisingly simple solution to skip this last step for their reasoning model. Instead of generating answers to questions like ‘How is it like to live on the moon?’ that must be assessed by humans, Deepseek generated answers to questions that could be solved programmatically. For example, the AI had to write a certain piece of code. The validity of this code can be tested and the answers ranked accordingly, whether the code is correct.
What does this mean for future developments in AI?
Another surprise to me was that Deepseek released their model as open weights with a paper detailing their methods. So, everybody can see how the model was trained, how it works, and everyone can download it for their own use.
For these two reasons, Deepseek’s efficiency and easy availability, this could prove to be a pivotal moment in AI. We will need less resources to run better AI models, which will likely lead to more adoption of AI.
It will however not mean that less resources will be spent on AI. The so-called Jevon’s paradox states that increasing efficiency can lead to an even higher use of resources, mainly because new applications for AI are now possible that were until now too expensive.
Which new areas of AI application will open up now?
For smaller teams, it becomes interesting to adapt existing large-language models to their own use by fine-tuning them on their own data. For example, we at the FHNW School of Computer Science can help companies to build custom AI models by fine-tuning them on the company’s data to be better useful for their use cases and the Swiss market.
How should Swiss companies and decision makers react to Deepseek?
Deepseek has shown that it is possible to catch up to the big US companies with less resources. But it raises a major question for Switzerland or Europe: Why couldn’t a Swiss or European company come up with such an innovation, although they apparently have more resources? AI models will be critical for future developments in business and daily life, and in Switzerland, we need to have our own AI models that are well-aligned with our cultural values. This requires a dedicated team of AI talent working around the clock for at least the next 2–3 years to close the gap with other competitors.