Belz, A., Thomson, C., Reiter, E., Abercrombie, G., Alonso-Moral, J. M., Arvan, M., Cheung, J., Cieliebak, M., Clark, E.
, Deemter, K. V., Dinkar, T., Dušek, O., Eger, S.
, Fang, Q., Gatt, A., Gkatzia, D., González-Corbelle, J., Hovy, D., Hürlimann, M., ... Yang, D. (2023).
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP. In
The Fourth Workshop on Insights from Negative Results in NLP (pp. 1-10). Association for Computational Linguistics.
https://aclanthology.org/2023.insights-1.1https://dspace.library.uu.nl/bitstream/handle/1874/429997/2023.insights-1.1.pdf?sequence=1