Exploring identification of social determinants of health (SDOH) data in NHLBI BioData Catalyst® (BDC) using embedding-based methods

Madan Krishnamurthy; Leena R. Dave; Timothy S Slade; Laura Haak Marcial; Joel Michael Montavon; Ben Tyndall; Jacqueline Ellis Ortiz; Anne Thessen

Exploring identification of social determinants of health (SDOH) data in NHLBI BioData Catalyst® (BDC) using embedding-based methods

Krishnamurthy, M., Dave, L. R., Slade, T. S., Marcial, L. H., Montavon, J. M., Tyndall, B., Ortiz, J. E., & Thessen, A. (2025). Exploring identification of social determinants of health (SDOH) data in NHLBI BioData Catalyst® (BDC) using embedding-based methods. Zenodo. Advance online publication. https://doi.org/10.5281/zenodo.15270779

Copy citation

Abstract

In alignment with the Make America Healthy Again initiative to promote the policies to improve public health, this initiative focuses on the identification of survey questions and answers in datasets hosted within the NHLBI BioData Catalyst® (BDC) ecosystem for the purpose of easier search in the ecosystem’s cohort-building tool, BDC Powered by PIC-SURE (BDC-PIC-SURE). Leveraging data standards developed by the Gravity Project, we systematically evaluated and ranked 113 high-value datasets within BDC based on the variables represented in the Gravity Project domains. The four datasets with the highest representation of variables in the Gravity Project domains were then manually annotated using Simple Knowledge Organization System (SKOS) relations to match survey questions and answers with the Gravity Project elements. We used this manually annotated data set as a “gold standard” to test a proof-of-concept annotation tool that uses embedding-based approaches to match survey-based data with the Gravity Project value set. Performance varied by domain, with employment status being the best (F1-Score = 1.0) and financial insecurity being the worst (F1-Score = 0.42). Some domains, such as financial insecurity, material hardship, and medical cost burden had significant overlap
and were challenging for human annotators to differentiate. Future work includes further refinement of this workflow by comparing the performance of different embedding algorithms, examining performance on categorical variables versus continuous variables, determining binning for semantic similarity scores (high, medium, low), and exploring the possibility of other vocabularies or annotating data with multiple domains.

Publications Info

To contact an RTI author, request a report, or for additional information about publications by our experts, send us your request.

publications@rti.org

RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.

Meet the Experts

Navigate to Timothy S. Slade

Timothy S. Slade

Navigate to Ben Tyndall

Ben Tyndall

Navigate to Jacqueline Bagwell

Jacqueline Bagwell

Recent Publications

Article

Dynamic operation of a bench-scale CO2 capture system with non-aqueous and monoethanolamine solvents in process-intensified equipment

September 2026

Article

Use of fentanyl test strips by people who inject drugs: Longitudinal findings from the south Atlantic fentanyl test strip study (SAFTSS)

August 2026

Article

Oral toxicokinetics of the indoor air pollutant, α-pinene, and its genotoxic metabolite, α-pinene oxide, in rodents and comparison to inhalation route of exposure

August 2026

Article

Implementation of the IWQOL-Lite-CT in observational research: Comparison of baseline scores with a clinical trial population and psychometric evaluation

August 2026

Article

Differences in patient-centered burdens and economic outcomes based on sociodemographic characteristics and social determinants of health: A scoping review

July 2026

Article

Racial differences in adverse pregnancy outcomes and incident hypertension: A mediation analysis

July 2026

Article

Mental health, substance use, and child maltreatment

July 2026

Article

Impact of enhanced practices on opioid overdose deaths: A community-based modeling approach

July 2026

View All Publications

Exploring identification of social determinants of health (SDOH) data in NHLBI BioData Catalyst® (BDC) using embedding-based methods

Abstract

Meet the Experts

Timothy S. Slade

Ben Tyndall

Jacqueline Bagwell

Recent Publications

Dynamic operation of a bench-scale CO2 capture system with non-aqueous and monoethanolamine solvents in process-intensified equipment

Use of fentanyl test strips by people who inject drugs: Longitudinal findings from the south Atlantic fentanyl test strip study (SAFTSS)

Oral toxicokinetics of the indoor air pollutant, α-pinene, and its genotoxic metabolite, α-pinene oxide, in rodents and comparison to inhalation route of exposure

Implementation of the IWQOL-Lite-CT in observational research: Comparison of baseline scores with a clinical trial population and psychometric evaluation

Differences in patient-centered burdens and economic outcomes based on sociodemographic characteristics and social determinants of health: A scoping review

Racial differences in adverse pregnancy outcomes and incident hypertension: A mediation analysis

Mental health, substance use, and child maltreatment

Impact of enhanced practices on opioid overdose deaths: A community-based modeling approach

RTI International congratulates DOE and research teams advancing the Genesis Mission

Large-scale study finds 61% of US adults with a mental health disorder received treatment; 34% received minimally adequate care

RTI International analysis finds Great Lakes blue economy tops $1 trillion

RTI International and Othram awarded NIJ funding for major study of forensic genetic genealogy across ancestral populations

New Approach Methodologies: Why Scientific Rigor Matters More Than Ever

Youth tobacco use continues to decline: RTI publishes results of the 2025 National Youth Tobacco Survey in partnership with FDA

Cogeneration’s Advantage: Efficiency, Resilience, and the Case for Captured Heat

Turning Clean Energy Investment into Economic Growth in North Carolina

Supporting Defense Innovation Through North Carolina’s Smart Textile Ecosystem

Microplastics in the Public Eye: What Consumers Are Saying—and Why It Matters