🚀

AI vs. Human vs. Human-AI Collaboration in Forecasting Tasks

Bachelor or Master Thesis

📌 Key facts

Mission: Exploring whether humans, AI or the combination of both, is more successful/effective in forecasting tasks through a systematic literature review.
When: Start ASAP!
How to apply: Send an e-mail (at the end of this page) with your CV, grade report and initial research summary.

This study is supposed to be a systematic literature review, focused on:

identifying studies, which focus on performance comparisons of AI vs. Human vs. human-AI collaboration (HAI) in forecasting tasks
examining factors, which lead to performance differences (e.g., forecast horizon, volatility)
examining key strengths and weaknesses of AI, humans and HAI in forecasting tasks.

Early Comparison of Human vs. Model vs. Collaboration:

Results were inconclusive; however, early meta-analyses found small but consistent advantages for models (Grove et al. 2000; Meehl 1954)
Humans outperformed models: domain knowledge, up-to-date info, or volatile conditions matter (Armstrong, 1983; Lawrence et al., 2006; Goodwin & Wright, 2010)

New Capabilities of Later Models:

AI has attracted attention as powerful forecasting methods (Kraus et al., 2020; Feuerriegel et al., 2023)
New Strengths: complex, non-linear relationships & uncover patters that were previously beyond the reach of univariate and explanatory models (Kraus et al., 2020; Bommasani et al. 2021; Bubeck et al. 2023; Shome et al. 2024)
New Weaknesses: vulnerable to overfitting, biased training data, hallucinations (Feuerriegel et al., 2023)

Thus, long-held views on the relative performance of humans, model and collaboration require reassessment!

If you are interested, please contact patricia.hornstein@tum.de by submitting:

Your CV
Your grade report
Your preferrable start date (must be before mid of october)
Initial search of studies (5-10) comparing empirically (not just theoretically) the forecasting performance of Human vs. AI (vs. Human-AI collaboration; apart from the ones already listed in “Further Reading”), with a summary sentence per study.

We're greatly looking forward to hearing more about you!

Abolghasemi, M., Ganbold, O., & Rotaru, K. (2025). Humans vs. large language models: Judgmental forecasting in an era of advanced AI. International Journal of Forecasting, 41(2), 631-648.
Li, X., Feng, H., Yang, H., & Huang, J. (2024). Can ChatGPT reduce human financial analysts’ optimistic biases?. Economic and Political Studies, 12(1), 20-33.
Hsieh, E., Fu, P., & Chen, J. (2024). Reasoning and tools for human-level forecasting. arXiv preprint arXiv:2408.12036.
Schoenegger, P., & Park, P. S. (2023). Large language model prediction capabilities: Evidence from a real-world forecasting tournament. arXiv preprint arXiv:2310.13014.
Pratt, S., Blumberg, S., Carolino, P. K., & Morris, M. R. (2024). Can Language Models Use Forecasting Strategies?. arXiv preprint arXiv:2406.04446.
Karger, E., Bastani, H., Yueh-Han, C., Jacobs, Z., Halawi, D., Zhang, F., & Tetlock, P. E. (2024). Forecastbench: A dynamic benchmark of ai forecasting capabilities. arXiv preprint arXiv:2409.19839.
Schoenegger, P., Tuminauskaite, I., Park, P. S., Bastos, R. V. S., & Tetlock, P. E. (2024). Wisdom of the silicon crowd: LLM ensemble prediction capabilities rival human crowd accuracy. Science Advances, 10(45), eadp1528.
Halawi, D., Zhang, F., Yueh-Han, C., & Steinhardt, J. (2024). Approaching human-level forecasting with language models. Advances in Neural Information Processing Systems, 37, 50426-50468.