DeepSeek R1

DeepSeek's open-weight reasoning model that matches OpenAI o1 benchmarks, trained using GRPO on a 671B Mixture of Experts architecture for approximately $6 million.

DeepSeek R1 is a frontier reasoning model released in January 2025 by the Chinese AI lab DeepSeek. Built on the DeepSeek V3 Mixture of Experts architecture (671B total parameters, 37B active per inference), R1 achieves 79.8% on AIME 2024, matching OpenAI o1's 79.2%. The model was trained using Group Relative Policy Optimization (GRPO), a novel RL method that eliminates the critic model. DeepSeek released the weights publicly along with detailed training documentation, and the distilled versions (1.5B to 70B parameters) have become popular for local deployment.

Also known as