QLoRA

A memory-efficient fine-tuning method that combines LoRA adapters with 4-bit quantization of base model weights, trading longer training time for roughly 33% memory savings.

QLoRA (Quantized LoRA) extends the LoRA fine-tuning approach by quantizing the frozen base model weights to 4-bit precision while training low-rank adapter matrices at full precision. This combination reduces GPU memory requirements significantly, enabling fine-tuning of larger models on consumer hardware. The tradeoff is approximately 39% longer training time compared to standard LoRA, with minimal impact on final model accuracy for most tasks.

Also known as