Research Papers 2d ago Updated 2d ago 49

Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

MDMs discard valuable internal computation during iterative refinement, necessitating redundant re-computation. Relay introduces a method to propagate

60
Hot
85
Quality
70
Impact

Deep Analysis

Background

Masked Diffusion Models (MDMs) are known for their iterative refinement process, where each step refines the sequence based on masked positions. However, this process discards rich internal computations from previous steps, requiring every subsequent step to recompute valuable information stored as model representations. This redundancy can be inefficient and hinder performance.

Key Points

To address this inefficiency, the paper proposes Learned Relay Representations (Relay). Relay allows MDMs to propagate latent information between forward passes by explicitly learning how to pass this information through differentiable per-token channels. The key insight is that by training these channels using truncated backpropagation through time (BPTT), MDMs can retain and utilize important internal computations from one step to the next, thus reducing redundancy.

Relay’s framework is designed to be compatible with state-of-the-art Diffusion Language Models (DLMs) such as Fast-dLLM v2. It is shown that this method scales effectively without disrupting existing techniques like block diffusion and KV caching. The paper demonstrates the effectiveness of Relay through a thorough justification on a Sudoku-based planning task, followed by its application to Fast-dLLM v2.

Significance

The significance of Relay lies in its ability to explicitly train DLMs to relay latent information forward across decoding steps. This approach advances the performance-latency Pareto frontier for DLMs. Specifically, it outperforms standard supervised finetuning on coding tasks while reducing inference latency by up to 32%. The paper provides empirical evidence and code for all experiments, validating the practical utility of Relay.

Relay’s impact is profound because it optimizes MDMs to leverage their internal computations more efficiently, leading to improved performance with reduced computational overhead. This method not only enhances the capabilities of DLMs but also sets a new standard for how latent information can be managed and utilized in sequence generation tasks.

Disclaimer: The above content is generated by AI and is for reference only.

Fine-tuning Embedding Model Training
Share: