Overview
Natural Language Processing (NLP) is a cross-disciplinary field of computer science and linguistics that aims to create automated systems for understanding human language.
More health data than ever before is being generated, and much of it is now recorded in written prose (unstructured text) thanks to advances in speech recognition.
Unstructured text like this can be found in clinic letters and diagnostic reports, with vast amounts of patient information that is not found in clinical audits (structured data). So, it is a priority to unlock information contained in unstructured texts and make the data available for healthcare research and at the point of patient care.
The Swansea Collaborative of Analysis and NLP Research (SCANR) works with local health boards across Wales to develop NLP applications that help enable novel research and use them to improve access to data in clinical practice.
The current portfolio includes information extraction in epilepsy, multiple sclerosis and cardiovascular disease. These NLP applications aim to convert unstructured text to structured datasets so that they can be linked to all-Wales datasets that currently exist in SAIL Databank.
There they can be used to help answer questions around understanding disease comorbidity and the social burden of disease, and also feed into research on precision medicine and personalized patient care.
We are also dedicated to producing open source software so that these applications are freely available.