Chinchilla Scaling

DeepMind's 2022 finding that optimal LLM training should scale data and parameters roughly equally, overturning earlier assumptions that bigger models were more compute-efficient.

Chinchilla scaling refers to the optimal training regime discovered in DeepMind's 2022 paper 'Training Compute-Optimal Large Language Models.' The research found that for a given compute budget, you should scale training data and model parameters roughly equally (the '20 tokens per parameter' rule), rather than prioritizing model size as earlier OpenAI research suggested. This finding reshaped industry training practices, making data the primary bottleneck. Later replication attempts found the exact optimal ratio varies with compute budget and that the original calculations contained systematic errors.

Also known as

Chinchilla optimal, compute-optimal scaling, Hoffmann scaling