Do Language Models Know What Not to Say? Causal Evidence for Statistical Preemption in LLMs
Neural language models (LLMs) demonstrate a correlation between surprisal patterns and human acceptability judgments in unacceptable sentences, indica
Deep Analysis
Background
This study investigates how large language models (LLMs) acquire knowledge of what is linguistically unacceptable without explicit negative evidence. The authors propose that exposure to a conventional form "preempts" structurally possible but unattested alternatives, supporting the hypothesis from Construction Grammar. To test this, they designed experiments comparing statistical preemption with entrenchment hypotheses using LLMs.
Key Points
Correlation Between Surprisal and Acceptability Judgments: Across four experiments involving 120 English verb-construction pairings (dative, causative, locative), the authors found a strong correlation ($r = 0.79$) between LLM surprisal patterns and human acceptability judgments. This was validated against three independent behavioral datasets.
Competing-Form Frequency Over Overall Verb Frequency: Surprisal patterns were driven by competing-form frequency rather than overall verb frequency, as confirmed through non-circular partial correlations. This suggests that models learn specific grammatical structures based on the context and competition of forms rather than general frequency alone.
Power Law Relationship with Model Size: Preemption sensitivity was observed to scale as a power law with model size, implying that larger models may have more nuanced understanding but also exhibit greater preemption sensitivity.
Causal Demonstration Through Fine-Tuning: A controlled fine-tuning intervention demonstrated that manipulating competing-form frequencies shifts preemption behavior in the predicted direction. Reverse-direction controls ruled out frequency-sensitivity confounds, providing causal evidence for statistical preemption over entrenchment.
Significance
These results provide strong evidence that neural language models acquire negative linguistic knowledge through distributional competition, supporting the core mechanism proposed by Construction Grammar. This finding is significant because it reveals how LLMs can learn complex grammatical rules without explicit negative feedback, potentially offering insights into human language acquisition processes and suggesting new directions for model training in natural language processing tasks.
Disclaimer: The above content is generated by AI and is for reference only.