Deepseek: The Quiet Giant Leading China’s AI Race
In the ever-evolving landscape of artificial intelligence, few stories are as compelling as that of Deepseek, a relatively low-profile Chinese AI startup that has swiftly risen to prominence. Despite operating under the radar, Deepseek has made significant strides, particularly with its latest R1 model outperforming OpenAI’s o1 across multiple reasoning benchmarks. This remarkable achievement positions Deepseek as the Chinese AI laboratory to watch, signaling a potential shift in the global AI hierarchy.
Origins and Foundations
Deepseek is not an isolated venture but rather a strategic offshoot of High-Flyer (幻方), a renowned Chinese quantitative hedge fund. High-Flyer, valued at an impressive $8 billion, is one of China's top four quantitative hedge funds, demonstrating a strong foothold in the financial sector. Liang Wenfeng, the CEO of Deepseek, has a notable history with High-Flyer, which serves as the primary financial backbone for Deepseek. Unlike many startups that frequently seek external funding, Deepseek is fully funded by High-Flyer, allowing it to focus on long-term research without the immediate pressures of fundraising.
Strategic Focus on Foundational Technology
Deepseek distinguishes itself from other AI startups through its unwavering commitment to foundational technology rather than immediate commercial applications. This strategic focus is evident in their decision to open source all of their models, a move that not only fosters community collaboration but also positions Deepseek as a transparent and inclusive player in the AI ecosystem. Additionally, Deepseek has ignited a price war in China’s AI model market by offering highly affordable API rates. This aggressive pricing strategy, coupled with access to High-Flyer’s substantial compute clusters—estimated to house upwards of “50k Hopper GPUs”—allows Deepseek to maintain scalability and competitive advantage.
Ambition for Artificial General Intelligence (AGI)
At the heart of Deepseek’s strategy lies an ambitious goal: to build Artificial General Intelligence (AGI). Unlike many other entities that intertwine their mission with themes of safety, competition, or humanity’s stake in AGI, Deepseek’s mission statement is refreshingly straightforward. They aim to “unravel the mystery of AGI with curiosity,” indicating a pure research-driven approach. This singular focus on AGI drives Deepseek to explore potentially transformative architectural and algorithmic innovations, setting them apart from other labs that may be more commercially or competitively motivated.
Technical Breakthroughs and Innovations
Deepseek’s journey has been marked by a series of impressive technical breakthroughs that have significantly impacted the AI landscape. Before the release of their R1-Lite-Preview model, Deepseek had already established a strong track record of innovations. Notable among these are the multi-head latent attention (MLA) architecture and the sparse mixture-of-experts (DeepseekMoE) model. These advancements have drastically reduced inference costs, triggering a price war among Chinese developers and positioning Deepseek as a cost-effective alternative to existing models.
Their coding model, trained on these innovative architectures, has outperformed open-weight rivals like July’s GPT4-Turbo, showcasing Deepseek’s ability to leverage its technical innovations into tangible performance gains. These successes not only enhance Deepseek’s reputation but also contribute to the broader AI community by providing more efficient and powerful tools for developers and researchers.
An In-Depth Perspective: Interview with CEO Liang Wenfeng
To gain a deeper understanding of Deepseek’s strategies and ambitions, an exclusive, in-depth interview with CEO Liang Wenfeng was conducted. Originally published in July on a 36Kr sub-brand, the interview provides profound insights into various aspects of Deepseek’s operations and future directions. The key themes explored in the interview include:
- AGI Ambitions and Research Strategy: Liang elaborates on how Deepseek’s pursuit of AGI drives their research priorities, emphasizing the importance of foundational innovations over immediate commercial gains.
- Open Source as a Dominant Strategy: The decision to open source all models is discussed, highlighting how this approach fosters community engagement and accelerates innovation through collaborative efforts.
- Hiring and Organizational Structure: Liang explains how Deepseek leverages young domestic talent more effectively than other labs, focusing on a culture that prioritizes passion and curiosity over traditional credentials.
- Fostering Hardcore Innovation: Addressing the broader context of Chinese firms often defaulting to copying and commercialization, Liang shares his vision for Deepseek to ignite more hardcore innovation across the Chinese economy, challenging the status quo.
Unveiling Deepseek: A Tale of Technological Idealism
Wechat, Archive link. Text | Lily Yu 于丽丽. Editor | Liu Jing 刘旌.
Among China’s seven prominent large-model startups, Deepseek stands out for its discretion and unexpected impact. A year ago, Deepseek’s affiliation with High-Flyer 幻方—a quantitative hedge fund powerhouse—positioned it uniquely, as the only non-big tech giant with a reserve of 10,000 A100 chips. Fast forward a year, and Deepseek has become the catalyst for China’s AI model price war, demonstrating its significant influence despite maintaining a low profile.
In May, amidst continuous AI developments, Deepseek released an open-source model named DeepSeek V2. This model offered an unprecedented price/performance ratio, reducing inference costs to merely 1 RMB per million tokens. This cost is approximately one-seventh of the expense associated with Llama3 70B and one-seventieth of GPT-4 Turbo, making DeepSeek V2 an attractive option for developers and businesses alike.
The Catalyst of a Price War
The release of DeepSeek V2 did not just introduce a new model to the market; it ignited a fierce price war among major Chinese tech giants such as ByteDance, Tencent, Baidu, and Alibaba. Dubbed the “Pinduoduo of AI,” Deepseek’s aggressive pricing strategy forced these industry leaders to slash their prices in response. This rapid sequence of price cuts highlighted the disruptive potential of Deepseek’s offerings and underscored the company’s role as a formidable competitor in the AI model market.
What set Deepseek apart was not just its pricing but also its underlying technical innovations. The company’s comprehensive advancements in model architecture, including the novel MLA (multi-head latent attention) and DeepSeekMoESparse structures, significantly reduced memory and computational costs. These innovations enabled Deepseek to offer high-performance models at a fraction of the cost of competitors, thereby triggering the price war.
Deepseek’s Quiet Dominance in Silicon Valley
In Silicon Valley, Deepseek is recognized as “the mysterious force from the East” (来自东方的神秘力量). Esteemed analysts and former industry leaders have lauded Deepseek’s technical prowess. SemiAnalysis’s chief analyst remarked that the DeepSeek V2 paper “may be the best one of the year,” while former OpenAI employee Andrew Carr praised it as “full of amazing wisdom.” Jack Clark, the former policy head at OpenAI and co-founder of Anthropic, acknowledged that DeepSeek “hired a group of unfathomable geniuses” and predicted that large models developed in China “will be as much of a force to be reckoned with as drones and electric cars.”
This recognition from prominent figures in the global AI community is a testament to Deepseek’s innovative capabilities. Their architectural innovations, particularly the MLA and DeepSeekMoESparse structures, have set new standards for efficiency and performance, challenging the dominance of established players like OpenAI.
Breaking Through the Architectural Barrier
One of the key reasons behind Deepseek’s success is its ability to innovate at the architectural level—an area where few domestic large model companies in China have ventured. Most Chinese firms have traditionally focused on imitating existing architectures, particularly the Llama structure, to expedite product deployment. However, Deepseek’s approach diverges significantly, prioritizing foundational research to build more efficient and capable models from the ground up.
The MLA (multi-head latent attention) architecture introduced by Deepseek reduces memory usage to between 5-13% of the commonly used MHA (multi-head attention) architecture. This drastic reduction in memory consumption not only enhances model efficiency but also lowers operational costs, making advanced AI models more accessible and affordable. Additionally, the DeepSeekMoESparse structure minimizes computational costs, further contributing to the overall reduction in inference costs.
Overcoming the Innovation Gap
Deepseek’s innovations address the substantial gaps that have historically hindered Chinese AI development. Liang Wenfeng explains that the primary gaps are in training efficiency and data efficiency. Deepseek estimates a twofold gap in both areas compared to the best international standards, meaning that Chinese models require twice the computing power and twice the training data to achieve equivalent results. By focusing on architectural innovations that enhance efficiency, Deepseek aims to close these gaps and position itself at the forefront of AI research.
A Different Path: Research Over Commercialization
While many Chinese AI companies prioritize rapid commercialization and application development, Deepseek has chosen a different path—focusing solely on research and foundational technology. This decision is rooted in their long-term vision of contributing to the global innovation wave in AI. Liang Wenfeng articulates that the goal is not to capitalize on immediate profits but to drive the technical frontier and support the development of the entire AI ecosystem.
This research-centric approach allows Deepseek to allocate its resources towards solving the most challenging problems in AI, without being distracted by the demands of commercial applications. By prioritizing innovation over short-term gains, Deepseek positions itself as a pioneer in the quest for AGI, setting the stage for future breakthroughs that could redefine the boundaries of artificial intelligence.
Challenging the Status Quo: Igniting Hardcore Innovation
Deepseek’s approach stands in stark contrast to the prevailing trend among Chinese firms, which often settle for copying and commercializing existing technologies rather than pursuing original innovation. Liang Wenfeng contends that this tendency is a result of historical and economic factors, where rapid commercialization was prioritized to capitalize on lucrative opportunities. However, this approach has left Chinese companies lagging in true technological innovation.
Deepseek aims to change this narrative by fostering a culture of hardcore innovation. By focusing on foundational research and encouraging the development of original models and algorithms, Deepseek hopes to inspire other Chinese companies to follow suit. This shift towards innovation-driven growth is essential for China to transition from being a follower to a leader in the global AI landscape.
Organizational Culture and Talent Management
A critical factor in Deepseek’s success is its unique organizational culture and approach to talent management. Unlike many large tech companies that rely heavily on recruiting overseas talent, Deepseek leverages young domestic talent, often recruiting fresh graduates from top universities and PhD candidates. This strategy not only fosters a dynamic and innovative work environment but also ensures that Deepseek remains deeply rooted in the local tech ecosystem.
Liang Wenfeng emphasizes that Deepseek’s hiring standards prioritize passion and curiosity over traditional credentials. This focus on intrinsic motivation and intellectual curiosity attracts individuals who are genuinely interested in pushing the boundaries of AI, rather than those solely driven by financial incentives or prestige. As a result, Deepseek cultivates a team of highly motivated and talented researchers who are committed to the company’s long-term vision.
Building a Technical Ecosystem Through Open Source
Deepseek’s commitment to open sourcing all of its models is a strategic decision that aligns with its research-focused approach. By making their models publicly available, Deepseek fosters a collaborative environment where researchers and developers can build upon their innovations. This openness not only accelerates the pace of technological advancement but also enhances Deepseek’s reputation as a transparent and community-oriented organization.
Moreover, open sourcing helps Deepseek establish a technical ecosystem where their innovations serve as foundational building blocks for further research and development. This ecosystem approach ensures that Deepseek remains at the core of AI advancements, driving progress through collective efforts rather than isolated achievements.
Navigating the Competitive Landscape
In an industry dominated by Silicon Valley giants, Deepseek’s emergence as a formidable competitor is noteworthy. The AI wave, largely driven by the innovations and investments of Silicon Valley, now sees a significant player emerging from China’s burgeoning tech sector. Deepseek’s ability to challenge established players like OpenAI by offering superior performance at lower costs underscores the shifting dynamics of the global AI market.
Liang Wenfeng acknowledges that while Deepseek’s technical innovations are crucial, the broader goal is to integrate into the global technological innovation stream. This integration requires not only cutting-edge research but also a deep understanding of how to navigate the competitive and collaborative aspects of the global AI community.
The Road Ahead: Deepseek’s Vision for AGI
Deepseek’s ultimate ambition is to contribute to the realization of AGI, a goal that requires overcoming some of the most complex challenges in artificial intelligence. AGI, characterized by its ability to understand, learn, and apply knowledge across a wide range of tasks at a human-like level, represents the pinnacle of AI research.
Liang Wenfeng outlines a roadmap that encompasses three key directions: mathematics and code, multimodality, and natural language understanding. Mathematics and code serve as controlled environments where AGI capabilities can be rigorously tested and refined. Multimodality, involving the integration of multiple types of data and sensory inputs, is essential for AGI to interact with the real world effectively. Natural language understanding, the ability to comprehend and generate human language, is a critical component of AGI’s interaction capabilities.
Deepseek’s Competitive Moat: Team and Culture
Despite choosing to open source its models, Deepseek maintains a competitive moat through its team and organizational culture. Liang Wenfeng believes that in the face of disruptive technologies, traditional moats created by closed-source models are temporary and ultimately insufficient to sustain long-term leadership. Instead, Deepseek anchors its value in its team—leveraging the collective expertise, creativity, and collaborative spirit of its researchers to drive continuous innovation.
The open-source approach not only democratizes access to Deepseek’s advancements but also fosters a sense of community and shared purpose among developers and researchers worldwide. This collaborative ethos enhances Deepseek’s reputation and ensures that their innovations are widely adopted and further developed, creating a positive feedback loop that sustains their leadership in the AI field.
The Importance of Original Innovation
One of the core themes in Deepseek’s philosophy is the importance of original innovation over imitation. Liang Wenfeng argues that China’s AI sector has historically lagged in original technological breakthroughs, often relying on copying existing models rather than developing new ones. This reliance on imitation has created a significant gap in innovation capability, hindering China’s ability to lead in the global AI landscape.
Deepseek aims to bridge this gap by prioritizing original research and encouraging its team to explore uncharted territories in AI. This focus on original innovation not only enhances the technical capabilities of Deepseek’s models but also positions the company as a leader in driving forward the boundaries of artificial intelligence.
Overcoming Challenges: Compute and Data Efficiency
One of the significant challenges facing AI research, particularly in the quest for AGI, is the efficient use of computational resources and data. Deepseek addresses these challenges through its innovative model architectures, which significantly reduce the computational and memory requirements of AI models.
The MLA architecture, for example, reduces memory usage to a fraction of traditional multi-head attention models, enabling more efficient training and inference processes. Similarly, the DeepSeekMoESparse structure minimizes computational costs, allowing for the training of larger and more complex models without an exponential increase in resource consumption.
By enhancing compute and data efficiency, Deepseek not only reduces operational costs but also makes advanced AI models more accessible to a broader range of users and developers. This efficiency is crucial for scaling AI research and development, particularly in resource-constrained environments.
Deepseek’s Role in the Global AI Ecosystem
Deepseek’s innovations and strategic decisions position it as a pivotal player in the global AI ecosystem. By focusing on foundational research, fostering a collaborative culture, and maintaining a commitment to open source, Deepseek contributes significantly to the collective advancement of artificial intelligence.
Furthermore, Deepseek’s success challenges the traditional dominance of Silicon Valley in AI research and development, highlighting the growing importance of China’s tech sector in shaping the future of AI. This shift has broader implications for the global distribution of AI expertise, resources, and leadership, potentially leading to a more diversified and competitive global AI landscape.
The Future of AI: Deepseek’s Vision and Roadmap
Looking ahead, Deepseek envisions a future where its foundational models serve as the bedrock for a wide array of AI applications and services. By continuing to push the boundaries of what is possible in AI research, Deepseek aims to develop models that are not only more efficient and capable but also more adaptable to a variety of contexts and applications.
Liang Wenfeng outlines a flexible and open-ended roadmap for Deepseek, recognizing that the path to AGI is fraught with uncertainty and complexity. However, by maintaining a focus on key areas such as mathematics, code, multimodality, and natural language understanding, Deepseek is well-positioned to make meaningful contributions to the field of AGI.
Cultivating Innovation Through a Unique Organizational Structure
Deepseek’s organizational structure plays a crucial role in fostering innovation and maintaining its competitive edge. Unlike traditional hierarchical organizations, Deepseek operates with a flat and flexible structure that encourages collaboration and the free exchange of ideas. This approach allows researchers to explore their interests and pursue innovative ideas without being constrained by rigid organizational boundaries.
Liang Wenfeng emphasizes that this bottom-up approach enables Deepseek to rapidly adapt to new challenges and opportunities, fostering a dynamic and responsive research environment. By empowering researchers to take ownership of their projects and initiatives, Deepseek cultivates a culture of innovation and excellence that drives continuous improvement and breakthrough advancements.
Deepseek’s Impact on the Chinese AI Industry
Deepseek’s success and strategic choices have significant implications for the broader Chinese AI industry. By demonstrating that a research-focused, open-source approach can lead to substantial technical breakthroughs and market disruption, Deepseek sets a new standard for AI startups in China. This shift encourages other companies to prioritize innovation and foundational research, potentially leading to a more vibrant and competitive AI ecosystem in China.
Moreover, Deepseek’s ability to leverage domestic talent effectively challenges the prevailing notion that China must rely heavily on overseas expertise to achieve technological leadership. By cultivating and empowering local talent, Deepseek showcases the potential of China’s homegrown researchers and developers to drive forward the nation’s AI ambitions.
The Path to AGI: Challenges and Opportunities
Achieving AGI remains one of the most formidable challenges in the field of artificial intelligence. Deepseek’s pursuit of this goal involves navigating a complex landscape of technical, ethical, and societal challenges. Despite these hurdles, Deepseek’s focused research and innovative strategies position it well to contribute significantly to the realization of AGI.
Liang Wenfeng acknowledges that the journey to AGI is uncertain and may take several years to decades. However, by maintaining a steadfast commitment to research and innovation, Deepseek is laying the groundwork for future breakthroughs that could bring humanity closer to achieving AGI.
Conclusion: Deepseek’s Promise for the Future
Deepseek embodies a unique blend of technological innovation, strategic foresight, and cultural commitment that sets it apart in the crowded AI startup landscape. By prioritizing foundational research, fostering a collaborative and open-source culture, and leveraging domestic talent, Deepseek is redefining what is possible in artificial intelligence.
As Deepseek continues to push the boundaries of AI research and development, it not only challenges established players but also inspires a new generation of AI innovators. The company’s journey serves as a beacon of what can be achieved through dedication, innovation, and a clear vision, positioning Deepseek as a pivotal player in the global quest for AGI.
In the rapidly advancing world of artificial intelligence, Deepseek stands as a testament to the power of focused research and strategic innovation. As the company continues to achieve new milestones and drive forward the boundaries of AI technology, it remains a key entity to watch in the unfolding story of artificial intelligence.