Please join us for a Joint GWU/CTSI-CN/Washington DC VA Informatics Seminar taking place Wednesday, January 29th, 11:00am - noon Eastern Standard Time. Dr. Adnan Lakdawala of The George Washington University will present:
Applying Zero Shot Learning to Infer Missing Data and Reduce Bias in the National Violent Death Reporting System
The National Violent Death Reporting System (NVDRS) is a vital resource for understanding violent deaths in the US. However, significant data missingness within the system introduces biases, which can affect downstream analyses and interventions. This study explores the use of Zero Shot Learning (ZSL), a natural language processing (NLP) technique, to infer missing data based on coroner and medical examiner (CME) reports, thereby improving data completeness.
NVDRS data (2003-2021) was first analyzed to assess missingness patterns and potential biases related to demographic characteristics. Our analysis dataset was limited to suicide decedents for whom CME reports were available, meaning cases where the variable 'NarrativeCME' was not missing. Two experienced abstractors manually reviewed CME reports for a sample of 200 suicide victims to create gold standard annotations for comparison. These annotations were used to test if NLP-based assignments using ZSL could generate more accurate labels for the “SubstanceCausedDeath1” variable. Various ZSL prompts were tested to evaluate their effectiveness in inferring missing data and reducing demographic discrepancies. The NVDRS was found to have high missingness, with 74.9% of variables containing missing data. For the SubstanceCausedDeath1 variable, 53.8% of values were missing in our analysis dataset (n=328,611), with missingness varying by demographic group. Applying ZSL to the 200 narrative CME reports resulted in a best accuracy of 94.5% with an AUC of 96.7% when using our best prompt and evaluating against our gold standard sample annotations.
Dr. Lakdawala is a Postdoctoral Associate at the Biomedical Informatics Center at GW. His interests include health data integration and analytics, mapping health terminologies, and decision support. Dr. Lakdawala is passionate about leveraging informatics to enhance clinical outcomes.