Bachelor or Master Thesis
📌 Key facts

- Mission: Exploring whether humans, AI or the combination of both, is more successful/effective in forecasting tasks through a systematic literature review.
- When: Start ASAP!
- How to apply: Send an e-mail (at the end of this page) with your CV, grade report and initial research summary.
💡 Background
This study is supposed to be a systematic literature review, focused on:
- identifying studies, which focus on performance comparisons of AI vs. Human vs. human-AI collaboration (HAI) in forecasting tasks
- examining factors, which lead to performance differences (e.g., forecast horizon, volatility)
- examining key strengths and weaknesses of AI, humans and HAI in forecasting tasks.
📚 Further Reading
- Abolghasemi, M., Ganbold, O., & Rotaru, K. (2025). Humans vs. large language models: Judgmental forecasting in an era of advanced AI. International Journal of Forecasting, 41(2), 631-648.
- Li, X., Feng, H., Yang, H., & Huang, J. (2024). Can ChatGPT reduce human financial analysts’ optimistic biases?. Economic and Political Studies, 12(1), 20-33.
- Hsieh, E., Fu, P., & Chen, J. (2024). Reasoning and tools for human-level forecasting. arXiv preprint arXiv:2408.12036.
- Schoenegger, P., & Park, P. S. (2023). Large language model prediction capabilities: Evidence from a real-world forecasting tournament. arXiv preprint arXiv:2310.13014.
- Pratt, S., Blumberg, S., Carolino, P. K., & Morris, M. R. (2024). Can Language Models Use Forecasting Strategies?. arXiv preprint arXiv:2406.04446.
- Karger, E., Bastani, H., Yueh-Han, C., Jacobs, Z., Halawi, D., Zhang, F., & Tetlock, P. E. (2024). Forecastbench: A dynamic benchmark of ai forecasting capabilities. arXiv preprint arXiv:2409.19839.
- Schoenegger, P., Tuminauskaite, I., Park, P. S., Bastos, R. V. S., & Tetlock, P. E. (2024). Wisdom of the silicon crowd: LLM ensemble prediction capabilities rival human crowd accuracy. Science Advances, 10(45), eadp1528.
- Halawi, D., Zhang, F., Yueh-Han, C., & Steinhardt, J. (2024). Approaching human-level forecasting with language models. Advances in Neural Information Processing Systems, 37, 50426-50468.
➕ Additional Information
Early Comparison of Human vs. Model vs. Collaboration:
- Results were inconclusive; however, early meta-analyses found small but consistent advantages for models (Grove et al. 2000; Meehl 1954)
- Humans outperformed models: domain knowledge, up-to-date info, or volatile conditions matter (Armstrong, 1983; Lawrence et al., 2006; Goodwin & Wright, 2010)
New Capabilities of Later Models:
- AI has attracted attention as powerful forecasting methods (Kraus et al., 2020; Feuerriegel et al., 2023)
- New Strengths: complex, non-linear relationships & uncover patters that were previously beyond the reach of univariate and explanatory models (Kraus et al., 2020; Bommasani et al. 2021; Bubeck et al. 2023; Shome et al. 2024)
- New Weaknesses: vulnerable to overfitting, biased training data, hallucinations (Feuerriegel et al., 2023)
Thus, long-held views on the relative performance of humans, model and collaboration require reassessment!
📝 How to Apply
If you are interested, please contact patricia.hornstein@tum.de by submitting:
- Your CV
- Your grade report
- Your preferrable start date (must be before mid of october)
- Initial search of studies (5-10) comparing empirically (not just theoretically) the forecasting performance of Human vs. AI (vs. Human-AI collaboration; apart from the ones already listed in “Further Reading”), with a summary sentence per study.
We're greatly looking forward to hearing more about you!