RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.
Aligning NLP models with target population perspectives using PAIR
Population-aligned instance eplication
Eckman, S. A., Ma, B., Kern, C., Chew, R., Plank, B., & Kreuter, F. (2025). Aligning NLP models with target population perspectives using PAIR: Population-aligned instance eplication. In Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP (pp. 100). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2025.nlperspectives-1.9
Models trained on crowdsourced annotations may not reflect population views, if those who work as annotators do not represent the broader population. In this paper, we propose PAIR: Population-Aligned Instance Replication, a post-processing method that adjusts training data to better reflect target population characteristics without collecting additional annotations. Using simulation studies on offensive language and hate speech detection with varying annotator compositions, we show that non-representative pools degrade model calibration while leaving accuracy largely unchanged. PAIR corrects these calibration problems by replicating annotations from underrepresented annotator groups to match population proportions. We conclude with recommendations for improving the representativity of training data and model performance.
RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.