Transformer models in biomedicine

Madan, Sumit; Lentzen, Manuel; Brandt, Johannes; Rueckert, Daniel; Hofmann-Apitius, Martin; Fröhlich, Holger

doi:10.1186/s12911-024-02600-5

BMC Medical Informatics and Decision Making

Table 1 Glossary of AI and machine learning terminology sorted alphabetically

From: Transformer models in biomedicine

Concept	Definition
Classification head	It consists of one or multiple layers attached to the head of the model that outputs predictions, typically accepting embeddings and performing classification, prediction or other tasks.
Contrastive learning	A technique that can be used to improve performance of machine learning models by learning representations that bring similar samples closer to each other and move dissimilar ones further apart in the embedding space.
Decoder	Similar to the encoder, the decoder in the transformer architecture also captures the relevant context of the input data and generates an output sequence by translating the high-dimensional embedded information in a step-by-step process. It can be used for many generative tasks performed by models such as GPT, BioGPT, and CEHR-GPT.
Distant supervision	A technique in machine learning that utilizes indirect labels to generate or augment training datasets for model training.
Domain-specific transformer-based models	These specialized models have been pre-trained or fine-tuned on data from a specific domain such as biomedicine.
Embeddings	They are numerical representations of data needed to process them with machine learning algorithms. Embeddings can be generated for various kinds of objects such as words, proteins, diagnosis codes, and medications.
Encoder	As a part of the transformer architecture captures the relevant context of the input data and generates a high-dimensional embedding that can be used in many downstream tasks (such as text classification, graph node classification, and image segmentation). Transformer-based models such as BERT are based solely on the encoder.
Explainable AI (XAI)	XAI refers to the field of designing algorithms that can comprehend predictions of machine learning models, providing more insights into their decision-making process.
Generative modeling	Refers to the process of generating new samples from a certain distribution that is learned from the underlying training data. The pre-trained transformer decoder-based models can generate new text, protein sequences, structured EHRs, and images.
Graph neural networks	These are neural networks specifically designed to learn from homogeneous and heterogeneous knowledge graphs and can perform tasks such as node classification, link prediction, and graph or sub-graph classification. They learn from the underlying structure and interconnections in graphs.
Fine-tuning phase	In this phase, the pre-trained model can subsequently be tuned by an additional supervised training for a specific task using labeled data.
Large language models (LLM)	LLMs are large neural networks based on transformer architecture pre-trained on vast amounts of textual data. They are capable of understanding and generating natural language.
Machine reading comprehension task	This task aims to train algorithms to comprehend and extract relevant information from textual data. The algorithm takes in a document and a query as input. The goal is to derive the correct answer based on the provided text. One common application is span extraction, where the output consists of the relevant span of text from the document that answers the query. During query definition additional knowledge can also be utilized to improve the performance.
Machine translation	This is a task of automatically translating from one human language to another using advanced machine learning algorithms.
Pre-training phase	In this phase, the machine learning models leverage the data to learn a general representation of the underlying objects (such as text or images) in a self-supervised manner.
Self-attention mechanism	It is a technique utilized in deep neural networks to focus on all or different parts of the input while processing this input. For instance, to learn the context of each word in a sentence, the model attends to every other word. This technique has been proven as beneficial for natural language processing and other sequential data.
Self-supervised learning	It is a machine learning approach in which the intrinsic data properties are utilized to create pseudo-labels for the data itself. Subsequently, the models are trained using this self-labeled data to understand and learn the underlying patterns and relationships.
Sequence labeling task	The objective of this task is to assign labels to units of a sequence. For instance, words or tokens in a sentence can be labeled as biological concepts. Similarly, in case of protein sequences, amino acids can be assigned labels for secondary structure elements (such as alpha-helix, beta-strand, or coil).
Sequence-to-sequence learning	The models trained with this learning technique are designed to map input sequences of one domain to output sequences of another domain. Summarization or translation of text or predicting the secondary structure from protein sequence are typical examples of sequence-to-sequence learning tasks.
Relational graph attention networks	These are a type of graph neural network that, as the name suggests, apply the self-attention mechanism to graph-relational data modeling the different relationships embedded in the graph.
Representation learning	It enables the data such as text, images, protein sequences for processing with mathematical operations by representing them as compact and dense vectors. These vectors are also called embeddings that carry relevant features of input data.
Transfer learning	With this technique the model designed for one task is reused or fine-tuned to perform a different, however related task, leveraging its pre-trained knowledge to improve learning efficiency and potentially achieve better performance with less data.
Transformer	It is a deep neural network architecture that utilizes self-attention to process sequential data (such as text, protein sequences and images) in a modular and scalable encoder-decoder architecture.
Transformer-based models	Deep learning models that employ the self-attention mechanism to handle data. Their structure utilizes encoder, decoder, or both parts of the transformer architecture.
Vision transformer (ViT)	Transformer architecture specifically designed for computer vision tasks.

Back to article page

ISSN: 1472-6947

Contact us

General enquiries: journalsubmissions@springernature.com