Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing
The paper addresses a critical challenge in using crowdsourced human feedback to fine-tune large language models (LLMs) for mobile applications like navigation. When workers strategically misreport their preferences to influence the outcome or maximize payment, standard aggregation methods fail, leading to poor performance (linear regret). The authors model this as a dynamic Bayesian game and propose a novel online weighted aggregation mechanism that dynamically weights workers based on their hi
Deep Analysis
Background
Mobile crowdsourcing platforms rely on feedback from mobile users (workers) to iteratively align and improve AI-generated content, such as traffic predictions from LLMs. A fundamental problem arises because workers are strategic agents who may misreport their feedback to manipulate the aggregated result or inflate their payment. Existing pipelines, like those using Expectation-Maximization (EM) for weight estimation, are designed for offline or non-strategic settings and cannot efficiently identify the most accurate contributors online. This leads to linear regret O(T), meaning the platform's cumulative performance loss grows steadily over time, which is suboptimal.
Key Points
The core contribution is formulating and solving this as a truthful online preference aggregation problem.
- Problem Formulation: The interaction is modeled as a dynamic Bayesian game between the platform and multiple strategic mobile workers. This game-theoretic framework explicitly accounts for the workers' private information (e.g., their true preferences or costs) and their incentives to misreport.
- Proposed Mechanism: Online Weighted Aggregation. The key innovation is a mechanism that assigns and dynamically adjusts a weight to each worker in the preference aggregation process. This adjustment is based on the worker's feedback accuracy over time, which is inferred from the evolving context and outcomes.
- Core Guarantees:
- Truthfulness: The mechanism is designed so that strategically misreporting feedback is not a beneficial strategy for any worker. Truthful reporting becomes their optimal strategy, ensuring the quality of collected feedback.
- Sublinear Regret: The platform's learning process achieves a regret bound of O(√T), a dramatic improvement over the O(T) baseline. This means the average performance loss per time slot diminishes as the system runs longer, converging toward optimal performance.
- Robust Extension: The mechanism is proven to maintain its O(√T) regret guarantee even under the challenging condition of limited worker feedback per time slot, making it practical for real-world scenarios with sparse participation.
Significance
This research makes a substantial contribution at the intersection of mechanism design, online learning, and AI.
- Practical Impact: It provides a principled, game-theoretic solution to a real-world problem plaguing the crowdsourcing-driven improvement of AI systems. The resulting LLMs are better aligned and more accurate for end-users.
- Theoretical Advancement: The shift from linear to sublinear regret is a fundamental improvement in online learning efficiency. The paper demonstrates how to incentivize truthfulness in a dynamic, strategic environment while simultaneously optimizing learning performance.
- Methodological Bridge: It successfully integrates dynamic Bayesian game theory with online learning algorithms, offering a template for designing truthful mechanisms in other interactive, multi-agent learning systems beyond mobile crowdsourcing. The experimental validation on real datasets confirms the practical relevance of the theoretical guarantees.
Disclaimer: The above content is generated by AI and is for reference only.