nanochat
Andrej Karpathy's open-source project for efficient GPT-2 training, designed as a minimal and forkable research codebase.
Nanochat is an open-source MIT-licensed project by Andrej Karpathy that provides roughly 1,000 lines of readable code for training GPT-2-scale language models. It serves as both a speedrun benchmark for training efficiency and a starting point for researchers testing new optimization techniques. The project incorporates dozens of modern training optimizations and has compressed GPT-2 training to under 3 hours on a single 8×H100 node.
Also known as
nano-chat