Skip to main content

Table 5 Overview of biological sequencing analysis studies

From: Transformer models in biomedicine

Study

Data sources

Model architecture

Biomedical tasks

ProtTrans

[75]

UniRef, UniParc, and Big Fantastic Database

BERT, T5, Transformer-XL, Albert, Electra, XLNet

Prediction of secondary structure and per-protein location and membrane prediction

ESM-1b Transformer

[76]

UniParc

Transformer

Remote homology detection, prediction of secondary structure and tertiary contacts, prediction of mutational effects

ProteinBERT

[77]

UniRef, Gene Ontology

BERT extended

Prediction of secondary structure, remote homology, fluorescence, and protein stability

MSA Transformer

[96]

Multiple sequence alignments (MSA) based on UniRef

Modified transformer

Contact prediction, secondary structure prediction

ProtGPT2

[82]

UniRef

GPT2

Sequence generation, homology detection, disorder prediction,

SignalP 6.0

[97]

UniProt, PROSITE, TOPDB

ProtBERT + CRF

Detection of signal peptide types

Tranception

[98]

UniProt, ProteinGym

Autoregressive transformer

Protein fitness prediction

[99]

UniProt, EVmutation

ESM-1b transformer, variational autoencoder and more

Protein fitness prediction

TMBed

[100]

OPM, SignalP 6.0

ProtT5 + CNN

Prediction of transmembrane classes for each residue

ReLSO

[101]

GIFFORD, GB1, GFP, TAPE

Transformer-based encoder

Designing new protein sequences

[102]

BIOSNAP, DAVIS, and BindingDB

ProtBERT + ChemBERTa

Prediction of drug-target interactions

STEP

[103]

BIOSNAP [104]

Siamese ProtBERT

Prediction of protein-protein interactions