Natural language processing data services for healthcare providers

Au Yeung, Joshua; Shek, Anthony; Searle, Thomas; Kraljevic, Zeljko; Dinu, Vlad; Ratas, Mart; Al-Agil, Mohammad; Foy, Aleksandra; Rafferty, Barbara; Oliynyk, Vitaliy; Teo, James T.

doi:10.1186/s12911-024-02713-x

Software
Open access
Published: 26 November 2024

Natural language processing data services for healthcare providers

Joshua Au Yeung^1,3,
Anthony Shek¹,
Thomas Searle¹,
Zeljko Kraljevic²,
Vlad Dinu²,
Mart Ratas²,
Mohammad Al-Agil²,
Aleksandra Foy¹,
Barbara Rafferty¹,
Vitaliy Oliynyk¹ &
…
James T. Teo^1,3

BMC Medical Informatics and Decision Making volume 24, Article number: 356 (2024) Cite this article

2529 Accesses
2 Altmetric
Metrics details

Abstract

Purpose of Review

Embedding machine learning workflows into real-world hospital environments is essential to ensure model alignment with clinical workflows and real-world data. Many non-healthcare industries undergoing digital transformation have already developed data labelling and data quality management services as a vertically integrated business process.

Recent Findings

In this paper, we describe our experiences developing and implementing a first-of-its-kind clinical NLP (natural language processing) service in the National Health Service, United Kingdom using parallel harmonised platforms. We report on our work developing clinical NLP resources and implementation framework to distil expert clinical knowledge into our NLP models. To date, we have amassed over 26,086 annotations spanning 556 SNOMED CT concepts working with secondary care specialties.

Summary

Our integrated language modelling service has delivered numerous clinical and operational use-cases using named entity recognition (NER). Such services improve efficiency of healthcare delivery and drive downstream data-driven technologies. We believe it will only be a matter of time before NLP services become an integral part of healthcare providers.

Peer Review reports

Introduction

Using natural language processing to “unlock” free-text information

With the adoption of electronic health records over the past decades, every patient encounter, investigation, diagnosis, and discussion are being recorded and stored. It is estimated that 80% of EHR(electronic health record) data exists in an unstructured format [1], this data consists of free-text documents filled with medical jargon, short-hand and abbreviations. To draw valuable insights and trends from clinical text data, it needs to be structured in a format digestible for computational models, only then can we deliver meaningful clinical impact.

Natural language processing (NLP) is a sub-group of artificial intelligence focused on the processing and analysis of text. To understand the complexity of medical language, most modern clinical NLP models undergo unsupervised training on large amounts of text data, subsequent fine-tuning and validation requires human labelled or “annotated” clinical text data. Annotating text data can be a time-consuming process and often very costly. Industries have used data-labelling services (like Amazon Mechanical Turk, Appen, Scale AI and Upwork etc.) where labelling and abstraction activity is outsourced to an external entity. Whilst outsourcing annotations can be pragmatic, it is unsuitable for clinical text. Firstly, clinical text contains protected health information and identifiable data, using third-party labelling services could risk significant data breach. Secondly, outsourcing annotators fail to leverage on the main benefits of labelling near the source - clinicians know the local context and jargon of what they are labelling, arguably they are also the best informed to understand and label clinical language that is used in medical documents. By leveraging domain experts to annotate clinical free-text at the source, we are able to curate a gold standard annotated text dataset which can be used to build, fine-tune or validate a healthcare-domain specific NLP model.

We describe the experiences of embedding the first natural language processing service in the UK national health service (NHS) and instilling clinical knowledge from clinician annotators. We share our published work of how an NLP service can support projects in healthcare, our learnings, resources and experience using our NLP software MedCAT (Medical concept annotation tool) to uncover valuable insights [2], reveal clinical trends, automating time-consuming administrative tasks, clinical coding [3], monitor disease trends [4], data anonymisation [5], forecast patient trajectories [6], summarising hospital-wide health data [7] and more (Fig. 1). We share our clinical text annotation best-practice framework (supplementary A) to map clinical text to clinical classification systems, also known as “named entity recognition + linking” (NER + L), like ICD-10 or SNOMED-CT. To date, we have amassed over 26,086 annotations spanning 556 SNOMED CT concepts working with secondary care specialties (Fig. 2).

The technology

The CogStack platform is deployed across King’s College Hospital NHS trust, Guys and St’ Thomas’ NHS trust, and South London and Maudsley NHS Trust. The CogStack platform functions as a data lake that draws data (including free-text) from multiple electronic health records and other data sources [9]. Documents retrieved through CogStack are subsequently processed using the MedCAT (medical concept annotation toolkit), an NLP toolkit for named entity recognition + linking (NER + L) (Fig. 3) [10].

Key learnings from implementation and delivery of a natural language processing service in the NHS

Distilling and encoding domain-expert knowledge

Expert clinical knowledge amongst clinical experts combines knowledge about diseases, diagnostics, clinical decision-making, together with the implicit working knowledge of clinical workflows, styles of clinical practice and clinical intent. Distilling these forms of expert knowledge into a model that allows scalable application of this knowledge to large volumes of healthcare data. For NLP, this task is labelling and annotating text using the expert’s linguistic experience and contextual knowledge of clinical workflows and intent.

Aligning team incentives to drive success

Successful projects require clinician engagement and time, in particular clinicians have to be actively committed to the task of annotating and the iterative process of correcting annotations, this can often be a repetitive and tedious task. Therefore, it is important to set clear objectives and milestones before the project is commenced. During the process, the NLP service should give annotators support and guidance in the annotation process to ensure annotation alignment between annotators. The most successful projects are ones where clinicians have identified a clear clinical or research problem that they wish to solve.

Capturing annotations accurately and concisely

The MedCAT interface was designed by developers and researchers in conjunction with clinicians to make annotating intuitive [10]. The MedCAT trainer model uses an online learning process that recognises concepts being annotated and automatically flags up similar concepts in subsequent documents (Fig. 4) [11]. This is useful to reduce annotator fatigue, as well as reduce the incidence of a term being missed in a long span of text.

A framework for implementing natural language processing as a service

Below we highlight a NLP implementation framework from scope definition, annotation guidelines, to model deployment and real-world validation. The annotation framework establish an annotation standard that can be used to obtain consistent, high-quality medical annotations that is then used for model building, fine-tuning and validation (Fig. 5).

1. Scope definition

The first step is identifying the scope and nature of the project- is it a clinical audit or service evaluation, a research project or an operational evaluation? Appropriate approvals should be obtained prior to commencement of the project.

2. Cohort selection

In this step, we select the patient or population group of interest, and optionally the comparison or control group. Clinical departments or research teams may opt to supply an existing curated patient list that has been collected manually or exist in a database. Alternatively, the entirety of the patient text records can be coded using an NLP-model approach discussed below [7], the cohort can then be filtered based on clinical codes of interest.

3. Document retrieval

The next step is to retrieve relevant documents and meta-data from the patient cohort that helps address the project scope/ aim. Important considerations include:

Document selection – Which documents does the data sit within the EHR?
Meta-data selection - depending on what is captured in EHRs, we may wish to retrieve specific structured meta-data to help address our scope e.g. medications, department numbers, allergies, investigation orders and results.
Timeframe- What timeframe of documents do we wish to extract (all historical patient documents, or focus on focus on a defined time period)?
Pre-processing and data privacy [12] - redaction/ anonymisation of sensitive patient data, pre-processing free text to reduce noise.

4. Code selection / creation

This step involves identifying the information, or “named entities”, that we wish to extract from the clinical free text. The goal is to take free-text and structure it into a machine-readable format. We are able to structure free-text according to the selected concepts- usually clinical codes are used (such as SNOMED-CT, ICD-10, UMLS etc.). We can map synonymous mentions of “atrial fibrillation”, “afib” or “AF” to the SNOMED CT code − 49,436,004, Atrial fibrillation (disorder).

It is particularly important to decide on the depth of the selected codes. For example, if you are interested in diabetes mellitus as a concept- Are you mapping all mentions of diabetes mellitus as a general disease? Do you wish to annotate detailed classification codes to distinguish between type 1 and type 2 diabetes, or complications e.g. diabetic retinopathy, diabetic nephropathy.

5. Concept annotation

This step is often the most important and labour intensive, annotating involves working with expert clinicians to label clinical free text to create a ground truth dataset. The goal of annotation is to consistently label concepts within free text to classification codes, this creates a dataset that the NLP model can be trained on.

For example in the phrase: “This patient has a bronchial adenocarcinoma affecting the left lower lobe. The cancer was recently diagnosed in Oct 2023”. In this scenario, if one labelled “cancer” as the concept < lung cancer>, the annotation would be project specific while if one labelled “cancer” as the concept < malignancy>, the annotation would be generalisable.

Consistency between annotators is essential for model learning, when there is more than one annotator, it is important to evaluate inter-annotator agreement; all annotators are given an overlapping set of documents to annotate, and the degree of agreement between annotations are calculated. Additional contextual concept information can be added as meta-annotations (supplementary A).

6. and 7. Model training and quality checking

The next step is to train and optimise the performance of the NLP model, and perform subsequent quality checking [12]. We use an iterative training-correction cycle to validate that extracted annotations are correct and optimised [12]. The process is as follows:

I.
We split the clinician documents annotations into a train and test set (classically at 80:20 split). We then train an NLP model using our training set.
II.
We run the NLP model over the clinician annotated documents and compare model outputs to clinician annotated outputs.
III.
Find all false positive (FP) and false negative (FN) examples between NLP model and clinician annotators.
IV.
Manually review each FP and FN, if the FP/FN is a mistake or omission from the annotator, we can amend the annotations and return to step (I). If there are no correctable FP/FNs, we continue to step (V).
V.
Once all FP/FN are corrected, the final iteration of our trained NLP model is our optimised model. The performance of this model can now be assessed against our test set and NER + L performance calculated.

8. Run model + Real world-validation

The fine-tuned NLP model is then ran over the entire corpus of documents of interest to produce an aggregated tally of mentions of each medical concept within each document, which we refer to as “MedCAT mentions”. We can aggregate all relevant MedCAT mentions for each patient overtime. In a similar process to a clinician reading through and interpreting a patient’s previous history, we need to determine a threshold of MedCAT mentions where we can infer the true presence of a concept with a degree of certainty. For example, a single mention of the concept < hypertension > in an individual’s entire document history may be noise rather than signal- they may have suffered from a one-off hypertensive episode driven by anxiety or white-coat hypertension, rather than chronic hypertension.

Once a threshold has been set, each concept or disease in a patient then needs to be validated to a “ground truth” dataset in the real world. The ground-truth dataset is created by clinicians manually reading through patients’ documents and documenting the presence or absence of the concepts of interest. This final process allows confirmation and validation of the medCAT pipeline for concept or disease inference. An example is shown in Fig. 6 of applying this framework to a clinical audit.

Future directions - building a large language model for the UK national health service (NHS)

We have demonstrated how to leverage annotators at the source to distil expert clinical knowledge into our NLP models. With the recent popularisation of large language models (LLM), there has been significant excitement on the potential of LLMs in healthcare [14, 15]. Here we describe future directions of distilling expert clinical knowledge to fine-tune a foundational large language model grounded in healthcare.

One key step towards a grounded foundational model is to go beyond published clinical knowledge and instil real-world healthcare knowledge. We demonstrate a novel approach recently published from Zeljko et al. on how we can leverage clinician-finetuned MedCAT to extract clinical annotations and construct patient timelines [6]. These timelines can be seen foundational data for foundational models that is distilled real-world clinical knowledge derived from electronic health records. The next step is to align the LLM with clinician intent and preference to overcome limitations such as hallucinations, inconsistent outputs and misalignment [16,17,18]. One approach to this is to train a reward model through reinforcement learning with human (clinician) feedback (RLHF) where clinician annotators can rank and re-write LLM outputs. Generating a reward model through clinician feedback is a relatively new field, and one that future works should explore.

As a proof of concept, we have built NHS-LLM based on the foundational model of LLaMA 13B(Meta) [19] that was further grounded through instruction tuning with datasets such as the NHS UK conditions website and NICE guidelines (Fig. 7). From a qualitative perspective, NHS-LLM produces more grounded outputs compared to proprietary models such as ChatGPT and GPT-4. We aim to build on this approach to create an LLM that can be used for clinical settings, one that is less prone to hallucinations and more grounded in facts [20] (Fig. 7).

Conclusion

We have described our experiences as the first embedded natural language processing service operating within the NHS, using approaches such as named entity recognition to support projects and “unlock” data in the electronic health records. As an internal healthcare service, we are able to navigate traditional barriers to AI in healthcare such as data access, privacy, access to clinician annotators at the source, in turn we are able to deliver service-changing impact. We have laid out the blueprint to organise and leverage clinician expertise to annotate and collaboratively instil their knowledge into NLP models Fig. 8. We believe it will only be a matter of time before NLP services become an integral part of healthcare providers.

Data availability

Project name: Medical concept annotation tool (MedCAT). Project home page: https://github.com/CogStack/MedCAT. Operating system(s): Windows, Linux, MacOS. Programming language: Python. Other requirements: Python 3.0 or higher. License: Elastic 2.0. Any restrictions to use by non-academics: The licens or grants you a non-exclusive, royalty-free, worldwide, non-sublicensable, non-transferable license to use, copy, distribute, makeavailable, and prepare derivative works of the software. Details of each case subject to the limitations and conditions as specified in the license document.

References

Kong H-J. Managing unstructured Big Data in Healthcare System. Healthc Inf Res. 2019;25:1–2.
Article Google Scholar
Bean DM, et al. Angiotensin-converting enzyme inhibitors and angiotensin II receptor blockers are not associated with severe COVID-19 infection in a multi-site UK acute hospital trust. Eur J Heart Fail. 2020;22:967–74.
Article CAS PubMed Google Scholar
Shek A, et al. Machine learning-enabled multitrust audit of stroke comorbidities using natural language processing. Eur J Neurol. 2021;28:4090–7.
Article PubMed Google Scholar
Teo JTH, et al. Real-time clinician text feeds from electronic health records. NPJ Digit Med. 2021;4:35.
Article PubMed PubMed Central Google Scholar
Kraljevic Z et al. 2023 IEEE 11th International Conference on Healthcare Informatics (ICHI) (IEEE, 2023). https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ichi57859.2023.00098
Kraljevic Z, et al. Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study. Lancet Digit Health. 2024;6:e281–90.
Article CAS PubMed PubMed Central Google Scholar
Bean DM, Kraljevic Z, Shek A, Teo J, Dobson RJ. B. Hospital-wide natural language processing summarising the health data of 1 million patients. PLOS Digit Health. 2023;2:e0000218.
Article PubMed PubMed Central Google Scholar
Johnson AEW et al. MIMIC-III, a freely accessible critical care database. Sci Data 3, (2016).
Jackson R, et al. CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust Hospital. BMC Med Inf Decis Mak. 2018;18:1–13.
Google Scholar
Kraljevic Z et al. MedCAT -- Medical Concept Annotation Tool. (2019) https://doiorg.publicaciones.saludcastillayleon.es/10.48550/ARXIV.1912.10166
Searle T, Kraljevic Z, Bendayan R, Bean D, Dobson R, MedCATTrainer. A biomedical free text annotation interface with active learning and research use case specific customisation. (2019) https://doiorg.publicaciones.saludcastillayleon.es/10.48550/ARXIV.1907.07322
Kraljevic Z et al. Validating transformers for redaction of text from electronic health records in real-world healthcare. (2023) https://doiorg.publicaciones.saludcastillayleon.es/10.48550/ARXIV.2310.04468
Dong H et al. Automated clinical coding: what, why, and where we are? NPJ Digit Med 5, (2022).
Brown TB et al. Language Models are Few-Shot Learners. (2020) https://doiorg.publicaciones.saludcastillayleon.es/10.48550/ARXIV.2005.14165
Singhal K, et al. Large language models encode clinical knowledge. Nature. 2023;620:172–80.
Article CAS PubMed PubMed Central Google Scholar
Au Yeung J, et al. AI chatbots not yet ready for clinical use. Front Digit Health. 2023;5:1161098.
Article PubMed PubMed Central Google Scholar
Maynez J, Narayan S, Bohnet B, McDonald R. On Faithfulness and Factuality in Abstractive Summarization. (2020) https://doiorg.publicaciones.saludcastillayleon.es/10.48550/ARXIV.2005.00661
Bai Y et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. (2022) https://doiorg.publicaciones.saludcastillayleon.es/10.48550/ARXIV.2204.05862
Touvron H et al. LLaMA: Open and efficient foundation language models. (2023) https://doiorg.publicaciones.saludcastillayleon.es/10.48550/ARXIV.2302.13971
Zeljko, A Large Language Model for Healthcare. AI for Healthcare https://aiforhealthcare.substack.com/p/a-large-language-model-for-healthcare (2023).

Download references

Funding

JTT has previously received research grant support from Innovate UK, NHSX, Office of Life Sciences, NIHR, Health Data Research UK, Bristol-Meyers-Squibb and Pfizer; has received honorarium from Bayer, Bristol-Meyers-Squibb and Goldman Sachs; holds stock in Amazon, Alphabet, Nvidia; and receives royalties from Wiley-Blackwell Publishing.

Author information

Authors and Affiliations

CogStack, Guys and St Thomas NHS Trust, London, UK
Joshua Au Yeung, Anthony Shek, Thomas Searle, Aleksandra Foy, Barbara Rafferty, Vitaliy Oliynyk & James T. Teo
Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
Zeljko Kraljevic, Vlad Dinu, Mart Ratas & Mohammad Al-Agil
Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
Joshua Au Yeung & James T. Teo

Authors

Joshua Au Yeung
View author publications
You can also search for this author inPubMed Google Scholar
Anthony Shek
View author publications
You can also search for this author inPubMed Google Scholar
Thomas Searle
View author publications
You can also search for this author inPubMed Google Scholar
Zeljko Kraljevic
View author publications
You can also search for this author inPubMed Google Scholar
Vlad Dinu
View author publications
You can also search for this author inPubMed Google Scholar
Mart Ratas
View author publications
You can also search for this author inPubMed Google Scholar
Mohammad Al-Agil
View author publications
You can also search for this author inPubMed Google Scholar
Aleksandra Foy
View author publications
You can also search for this author inPubMed Google Scholar
Barbara Rafferty
View author publications
You can also search for this author inPubMed Google Scholar
Vitaliy Oliynyk
View author publications
You can also search for this author inPubMed Google Scholar
James T. Teo
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

JAY conception, design, drafting, writing. AS drafting, writing, review. TS drafting, review. ZK drafting, review, design. VD drafting, review. MR review. MAA review, AF review. VO review. MA-A review. BR review. JTT conception, design, review.

Corresponding author

Correspondence to Joshua Au Yeung.

Ethics declarations

Ethics approval and consent to participate

NER + L experiments use freely available open-access datasets accessible by data owners. SNOMED-CT and UMLS licences were obtained by all users at all hospital sites. Site specific ethics is listed below. KCH: This project operated under London South East Research Ethics Committee approval (reference 24/LO/0057) granted to the King’s Electronic Records Research Interface (KERRI); specific work on research on natural language processing for clinical coding was reviewed with expert patient input on the KERRI committee with Caldicott Guardian oversight. Governance is provided for all projects and dissemination through a patient-led oversight committee. Individual consent from participants was not required as the data is de-identified and used in a data-secure format, with all personal health information redacted. All patients who do not wish for their data to be used have the choice of national data opt-out (NDOO) which excludes their data from being used for subsequent research or audit projects.

Consent for publication

Not applicable: No informed consent was sought from individual persons, as no individual persons’ data is presented in this manuscript.

Competing interests

JTT has previously received research grant support from Innovate UK, NHSX, Office of Life Sciences, NIHR, Health Data Research UK, Bristol-Meyers-Squibb and Pfizer; has received honorarium from Bayer, Bristol-Meyers-Squibb and Goldman Sachs; holds stock in Amazon, Alphabet, Nvidia; and receives royalties from Wiley-Blackwell Publishing. Other authors have no conflcts of interest to declare. Other authors have no conflicts of interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Au Yeung, J., Shek, A., Searle, T. et al. Natural language processing data services for healthcare providers. BMC Med Inform Decis Mak 24, 356 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-024-02713-x

Download citation

Received: 29 February 2024
Accepted: 08 October 2024
Published: 26 November 2024
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-024-02713-x

Natural language processing data services for healthcare providers

Abstract

Purpose of Review

Recent Findings

Summary

Introduction

Using natural language processing to “unlock” free-text information

The technology

Key learnings from implementation and delivery of a natural language processing service in the NHS

Distilling and encoding domain-expert knowledge

Aligning team incentives to drive success

Capturing annotations accurately and concisely

A framework for implementing natural language processing as a service

1. Scope definition

2. Cohort selection

3. Document retrieval

4. Code selection / creation

5. Concept annotation

6. and 7. Model training and quality checking

8. Run model + Real world-validation

Future directions - building a large language model for the UK national health service (NHS)

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Informatics and Decision Making

Contact us