Cohen’s and Fleiss’ kappa are household names for measuring inter-rater agreement, but both assume that each rater picks only one category per subject. That’s a real limitation in many real-world scenarios—think psychiatric diagnoses (patients with several disorders), or coding qualitative data using multiple labels.
In our latest paper (see below), a Generalized Fleiss' kappa is introduced that:
To cite the paper:
Moons, F. & Vandervieren, E. (2025). Measuring agreement among several raters classifying subjects into one or more (hierarchical) categories: A generalization of Fleiss’ kappa. Behavior Research Methods, 57, 287. https://doi.org/10.3758/s13428-025-02746-8
Download the Excel add-in for applying the generalized Fleiss’ kappa:
Remark: First install the add-in, next open the worksheet.
The data-repository on OSF: https://osf.io/q5nft/