Research Papers 2d ago Updated 2d ago 57

FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

FuRA (Full-Rank Adaptation) introduces a novel approach to fine-tuning large language models by reparameterizing weight matrices through their singula

75
Hot
90
Quality
80
Impact

Deep Analysis

Background

The article addresses the issue of noisy gradients in fine-tuned large language models (LLMs), which can disrupt robust pretrained features. Existing methods like Full Fine-Tuning (Full FT) and Parameter-Efficient Fine-Tuning (e.g., LoRA) fail to consider the spectral structure established during pretraining, leading to suboptimal performance.

Key Points

FuRA proposes a full-rank adaptation framework based on a block tensor-train factorization (W = LSR), where:

  • The large core (L) is fixed to the pretrained block-wise singular value decomposition (SVD) basis.
  • Only the compact cores (R) and the block-wise singular values (S) are optimized.

Key insights include:
FuRA leverages full-rank spectral preconditioning, ensuring that updates remain within the pretrained column space, which prevents noisy gradients from perturbing robust pretrained features. It also maintains full-rank update expressivity and achieves parameter, memory, and step-time efficiency comparable to LoRA. The 4-bit quantized variant, QFuRA, further enhances efficiency while outperforming QLoRA.

Experiments

The article evaluates FuRA across multiple settings:

  • LLM fine-tuning (e.g., LLaMA-3-8B commonsense reasoning): FuRA achieves a +1.37 improvement over Full FT.
  • LLM reinforcement learning for mathematical reasoning: FuRA performs better than Full FT.
  • Visual instruction tuning for VLMs: FuRA outperforms Full FT.

Significance

FuRA's significance lies in its ability to achieve superior performance while maintaining efficiency. By leveraging full-rank spectral preconditioning, it addresses the issue of noisy gradients and ensures that updates stay within the pretrained column space. This framework not only outperforms Full FT but also surpasses LoRA variants both before and after quantization. FuRA's design is particularly valuable for applications requiring high performance with limited computational resources.

Disclaimer: The above content is generated by AI and is for reference only.

Fine-tuning LLM Training Quantization
Share: