George Margaritis
Title
Automated Data Extraction for Clinical Databases using Natural Language Processing
Abstract
Cardiothoracic surgery departments and clinics in the US rely on central agencies, such as the Society of Thoracic Surgeons (STS), to evaluate their operational performance compared to their peer institutions. Specifically, 97% of U.S. adult cardiac surgery programs collaborate and transfer their patient data to the STS National Database for quality improvement and risk assessment. However, as patient records primarily consist of unstructured text reports, such data transfers involve a large operational overhead, requiring manual data extraction by teams of experienced data managers. Utilizing recent breakthroughs in Natural Language Processing, we propose an end-to-end machine learning pipeline that automatically extracts patient data (either structured or unstructured) from multiple sources, over multiple patient visits, and for multiple target outcomes. Preliminary results on Massachusetts General Hospital data show promise and our methodology achieves up to 99% AUC in common diagnoses and up to 75-95% in more challenging ones. We believe that using an automated pipeline for data extraction can (i) significantly reduce operational overhead and costs for institutions when transferring their data and (ii) increase data consistency and quality, while reducing variation from human errors.
About the Speaker
George Margaritis is currently a PhD candidate at the Operations Research Center of the Massachusetts Institute of Technology (MIT ORC). His research interests lie in the intersection of Machine Learning and Optimization, and in particular how Machine Learning can be used with and for Optimization. He is also interested in applications of Deep Learning and Natural Language Processing in the Healthcare domain.
George received his undergraduate Diploma degree from the school of Electrical and Computer Engineering of the Technical University of Crete (2021). He did his diploma thesis on Federated Learning and Differential Privacy under the supervision of Prof. M. Garofalakis. Previously, George also completed a research internship at the Computer Vision and Robotics Laboratory of the Foundation of Research and Technology Hellas (CVRL FORTH).