Research Papers 2d ago Updated 2d ago 45

Query-Adaptive Semantic Chunking for Retrieval-Augmented Generation: A Dynamic Strategy with Contextual Window Expansion

QASC significantly enhances the retrieval of relevant context in RAG systems by dynamically constructing chunks based on user queries, achieving an F1

50
Hot
85
Quality
60
Impact

Deep Analysis

Background

RAG systems rely heavily on document chunking quality for retrieving appropriate context. However, traditional methods such as fixed and recursive splitting produce uniform segments irrespective of semantic content or user intent, leading to a precision-recall trade-off that cannot be resolved by adjusting chunk sizes alone. Semantic and agentic methods partially address this issue but do not integrate user queries during the chunking stage.

Key Points

Query-Adaptive Semantic Chunking (QASC) improves upon existing techniques by dynamically constructing chunks based on user queries through three mechanisms:

  1. Cosine Similarity Scoring: Identifies seed sentences with high similarity to user queries.
  2. Contextual Window Expansion: Expands the contextual window around these seed sentences to preserve coherence.
  3. Chunk-Level Score Aggregation: Ensures overall relevance by aggregating scores across chunks.

QASC was evaluated on 100 technical documents using 200 queries from four different types, comparing against fixed chunking at five granularities, recursive splitting, semantic chunking, and agentic chunking methods. The evaluation revealed that QASC achieved an F1-score of 0.85, marking a relative improvement of 18-27% over fixed chunking and 8-12% over semantic and agentic alternatives.

Ablation studies confirmed the meaningful contribution of each component, further validating the effectiveness of QASC’s approach. Human evaluation by three annotators (Cohen Kappa = 0.82) corroborated that QASC produces more relevant and coherent chunks than existing methods, highlighting its superiority in handling user queries during chunking.

Significance

The introduction of QASC represents a significant advancement in RAG systems, addressing critical limitations in document chunking by integrating user queries directly into the segmentation process. This method not only improves retrieval precision but also enhances coherence and relevance, making it a valuable addition to the field of information retrieval and natural language processing.

By dynamically constructing chunks based on query similarity, QASC ensures that retrieved context is highly aligned with user intent, thereby enhancing the overall effectiveness of RAG systems in various applications such as knowledge graphs, document summarization, and interactive question-answering scenarios.

Disclaimer: The above content is generated by AI and is for reference only.

LLM RAG Fine-tuning Conversational AI Training
Share: