Can AI Guess What You Know? Performance Comparison of Large Language Models for Human Domain Knowledge Estimation From Communication Logs
Large Language Models (LLMs) can infer individual domain knowledge from Slack logs with varying degrees of accuracy. Gemini 2.5 Flash performed best a
Deep Analysis
Background
Organizational productivity is often hindered by employees' difficulty in identifying who possesses specific knowledge within their teams. This challenge can be mitigated through automated expertise mapping using Large Language Models (LLMs). The study aims to assess whether LLMs can accurately infer individual domain knowledge from long-term Slack log data.
Key Points
- Data Collection and Evaluation: The research analyzed 27,188 messages from 43 users. Seven models were evaluated: Gemini, Claude, and various GPT versions.
- Performance Metrics: Models' performance was measured using Mean Absolute Error (MAE), with Gemini 2.5 Flash achieving the lowest error rate of 21.13%. GPT models showed significantly larger discrepancies in their estimates.
- Message Volume Impact: The study found that estimation accuracy was weakly dependent on message volume, indicating that more text does not automatically lead to better inference.
Significance
- Feasibility and Limitations: These findings demonstrate the feasibility of using LLMs for automated expertise mapping but also highlight current limitations. Specifically, GPT models' poorer performance suggests a need for further model development or adaptation.
- Privacy Considerations: The research underscores the importance of privacy-preserving deployments in practical applications. Ensuring that user data is anonymized and protected is crucial to maintain trust within organizations.
- Data Representation: Richer, structure-aware representations of human knowledge could improve LLMs' ability to accurately infer expertise. This includes leveraging structured data, such as metadata from Slack logs (e.g., timestamps, channels), alongside unstructured text.
Key Insights:
- Model Variability: Different models perform differently in estimating individual expertise, with Gemini 2.5 Flash demonstrating the best performance.
- Volume vs. Accuracy: More text does not necessarily equate to better inference accuracy, suggesting that model design and data preprocessing are critical factors.
- Privacy Concerns: Privacy-preserving methods must be developed to ensure ethical use of LLMs in organizational settings.
The study's findings have significant implications for both the development and deployment of LLMs in expertise mapping applications, emphasizing the need for ongoing research into improving model accuracy and addressing privacy concerns.
Disclaimer: The above content is generated by AI and is for reference only.