LoRA

Low-Rank Adaptation, a parameter-efficient fine-tuning method that updates only a small fraction of model weights by learning low-rank decomposition matrices.

LoRA enables fine-tuning of large language models without updating all parameters. Instead of modifying the full weight matrices, LoRA learns small low-rank matrices that are added to the original weights during inference. This dramatically reduces compute requirements and memory footprint, making model adaptation accessible to teams without massive GPU clusters while achieving results competitive with full fine-tuning.

Also known as