FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

Deep Analysis

Background

The article addresses the issue of noisy gradients in fine-tuned large language models (LLMs), which can disrupt robust pretrained features. Existing methods like Full Fine-Tuning (Full FT) and Parameter-Efficient Fine-Tuning (e.g., LoRA) fail to consider the spectral structure established during pretraining, leading to suboptimal performance.

Key Points

FuRA proposes a full-rank adaptation framework based on a block tensor-train factorization (W = LSR), where:

The large core (L) is fixed to the pretrained block-wise singular value decomposition (SVD) basis.
Only the compact cores (R) and the block-wise singular values (S) are optimized.

Key insights include:
FuRA leverages full-rank spectral preconditioning, ensuring that updates remain within the pretrained column space, which prevents noisy gradients from perturbing robust pretrained features. It also maintains full-rank update expressivity and achieves parameter, memory, and step-time efficiency comparable to LoRA. The 4-bit quantized variant, QFuRA, further enhances efficiency while outperforming QLoRA.

Experiments

The article evaluates FuRA across multiple settings:

LLM fine-tuning (e.g., LLaMA-3-8B commonsense reasoning): FuRA achieves a +1.37 improvement over Full FT.
LLM reinforcement learning for mathematical reasoning: FuRA performs better than Full FT.
Visual instruction tuning for VLMs: FuRA outperforms Full FT.

Significance

FuRA's significance lies in its ability to achieve superior performance while maintaining efficiency. By leveraging full-rank spectral preconditioning, it addresses the issue of noisy gradients and ensures that updates stay within the pretrained column space. This framework not only outperforms Full FT but also surpasses LoRA variants both before and after quantization. FuRA's design is particularly valuable for applications requiring high performance with limited computational resources.

Disclaimer: The above content is generated by AI and is for reference only.

Deep Analysis

Background

Key Points

Experiments

Significance

Related Articles