Supervised Fine-tuning

A training stage that transforms a pre-trained language model into a conversational assistant by training on curated examples of ideal responses.

Supervised fine-tuning (SFT) is the middle stage of the modern LLM training pipeline, sitting between pre-training and RLHF. Using 10,000 to 100,000 carefully curated examples of desired assistant behavior, SFT teaches the model conversational format, helpful tone, and appropriate response structure. This transforms the raw completion engine from pre-training into something that behaves like a chatbot, before RLHF applies the final preference optimization.

Also known as