DoRA: Weight-Decomposed Low-Rank Adaptation

Abstract

Teaser
DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding.
Extra resources:
Jeremy Howard - general introduction
Amazing project on comparing QDoRA and QLoRA Sebastian Raschka - QDoRA

Publication
ICML 2024