Evaluation of text cluster naming with generative large language models

Sandy Preiss; Caren A Arbeit; Anthony Berghammer; John Colby Bollenbacher; John V. McCarthy; Madeline G. Brom; Mike Enger; Nicholas Rios Villacorta; Shaquavia Helaina Straughn

Evaluation of text cluster naming with generative large language models

Preiss, S., Arbeit, C. A., Berghammer, A., Bollenbacher, J. C., McCarthy, J. V., Brom, M. G., Enger, M., Villacorta, N. R., & Straughn, S. H. (2024). Evaluation of text cluster naming with generative large language models. Journal of Data Science, 22(3), 376-392. https://doi.org/10.6339/24-JDS1149

Copy citation

Abstract

Text clustering can streamline many labor-intensive tasks, but it creates a new challenge: efficiently labeling and interpreting the clusters. Generative large language models (LLMs) are a promising option to automate the process of naming text clusters, which could significantly streamline workflows, especially in domains with large datasets and esoteric language. In this study, we assessed the ability of GPT-3.5-turbo to generate names for clusters of texts and compared these to human-generated text cluster names. We clustered two benchmark datasets, each from a specialized domain: research abstracts and clinical patient notes. We generated names for each cluster using four prompting strategies (different ways of including information about the cluster in the prompt used to get LLM responses). For both datasets, the best prompting strategy beat the manual approach across all quality domains. However, name quality varied by prompting strategy and dataset. We conclude that practitioners should consider trying automated cluster naming to avoid bottlenecks or when the scale of the effort is enough to take advantage of the cost savings offered by automation, as detailed in our supplemental blueprint for using LLM cluster naming. However, to get the best performance, it is vital to test a variety of prompting strategies and perform a small test to identify which one performs best on each project’s unique data.

Publications Info

To contact an RTI author, request a report, or for additional information about publications by our experts, send us your request.

publications@rti.org

RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.

Meet the Experts

Navigate to Sandy Preiss

John McCarthy

Recent Publications

Article

Dynamic operation of a bench-scale CO2 capture system with non-aqueous and monoethanolamine solvents in process-intensified equipment

September 2026

Article

Use of fentanyl test strips by people who inject drugs: Longitudinal findings from the south Atlantic fentanyl test strip study (SAFTSS)

August 2026

Article

Oral toxicokinetics of the indoor air pollutant, α-pinene, and its genotoxic metabolite, α-pinene oxide, in rodents and comparison to inhalation route of exposure

August 2026

Article

Implementation of the IWQOL-Lite-CT in observational research: Comparison of baseline scores with a clinical trial population and psychometric evaluation

August 2026

Article

Differences in patient-centered burdens and economic outcomes based on sociodemographic characteristics and social determinants of health: A scoping review

July 2026

Article

Racial differences in adverse pregnancy outcomes and incident hypertension: A mediation analysis

July 2026

Article

Mental health, substance use, and child maltreatment

July 2026

Article

Impact of enhanced practices on opioid overdose deaths: A community-based modeling approach

July 2026

View All Publications

Evaluation of text cluster naming with generative large language models

Abstract

Meet the Experts

Sandy Preiss

Caren Arbeit

Anthony Berghammer

John McCarthy

Recent Publications

Dynamic operation of a bench-scale CO2 capture system with non-aqueous and monoethanolamine solvents in process-intensified equipment

Use of fentanyl test strips by people who inject drugs: Longitudinal findings from the south Atlantic fentanyl test strip study (SAFTSS)

Oral toxicokinetics of the indoor air pollutant, α-pinene, and its genotoxic metabolite, α-pinene oxide, in rodents and comparison to inhalation route of exposure

Implementation of the IWQOL-Lite-CT in observational research: Comparison of baseline scores with a clinical trial population and psychometric evaluation

Differences in patient-centered burdens and economic outcomes based on sociodemographic characteristics and social determinants of health: A scoping review

Racial differences in adverse pregnancy outcomes and incident hypertension: A mediation analysis

Mental health, substance use, and child maltreatment

Impact of enhanced practices on opioid overdose deaths: A community-based modeling approach

RTI International congratulates DOE and research teams advancing the Genesis Mission

Large-scale study finds 61% of US adults with a mental health disorder received treatment; 34% received minimally adequate care

RTI International analysis finds Great Lakes blue economy tops $1 trillion

RTI International and Othram awarded NIJ funding for major study of forensic genetic genealogy across ancestral populations

New Approach Methodologies: Why Scientific Rigor Matters More Than Ever

Youth tobacco use continues to decline: RTI publishes results of the 2025 National Youth Tobacco Survey in partnership with FDA

Cogeneration’s Advantage: Efficiency, Resilience, and the Case for Captured Heat

Turning Clean Energy Investment into Economic Growth in North Carolina

Supporting Defense Innovation Through North Carolina’s Smart Textile Ecosystem

Microplastics in the Public Eye: What Consumers Are Saying—and Why It Matters