Natural Language Processing – Finding the Missing Link for Oncologic Data, 2022

  • Andra Krauze Center for Cancer Research, National Cancer Institute, NIH, Building 10, Bethesda, MD, 20892, USA.
  • Kevin Camphausen Center for Cancer Research, National Cancer Institute, NIH, Building 10, Bethesda, MD, 20892, USA.
Keywords: Natural Language Processing (NLP), Artificial Intelligence (AI), Oncology, Electronic Medical Records (EMR), Clinical research

Abstract

Oncology like most medical specialties, is undergoing a data revolution at the center of which lie vast and growing amounts of clinical data in unstructured, semi-structured and structed formats. Artificial intelligence approaches are widely employed in research endeavors in an attempt to harness electronic medical records data to advance patient outcomes. The use of clinical oncologic data, although collected on large scale, particularly with the increased implementation of electronic medical records, remains limited due to missing, incorrect or manually entered data in registries and the lack of resource allocation to data curation in real world settings. Natural Language Processing (NLP) may provide an avenue to extract data from electronic medical records and as a result has grown considerably in medicine to be employed for documentation, outcome analysis, phenotyping and clinical trial eligibility. Barriers to NLP persist with inability to aggregate findings across studies due to use of different methods and significant heterogeneity at all levels with important parameters such as patient comorbidities and performance status lacking implementation in AI approaches. The goal of this review is to provide an updated overview of natural language processing (NLP) and the current state of its application in oncology for clinicians and researchers that wish to implement NLP to augment registries and/or advance research projects.

Published
2022-02-16