Study | Data sources | Model architecture | Biomedical tasks |
---|---|---|---|
ProtTrans [75] | UniRef, UniParc, and Big Fantastic Database | BERT, T5, Transformer-XL, Albert, Electra, XLNet | Prediction of secondary structure and per-protein location and membrane prediction |
ESM-1b Transformer [76] | UniParc | Transformer | Remote homology detection, prediction of secondary structure and tertiary contacts, prediction of mutational effects |
ProteinBERT [77] | UniRef, Gene Ontology | BERT extended | Prediction of secondary structure, remote homology, fluorescence, and protein stability |
MSA Transformer [96] | Multiple sequence alignments (MSA) based on UniRef | Modified transformer | Contact prediction, secondary structure prediction |
ProtGPT2 [82] | UniRef | GPT2 | Sequence generation, homology detection, disorder prediction, |
SignalP 6.0 [97] | UniProt, PROSITE, TOPDB | ProtBERT + CRF | Detection of signal peptide types |
Tranception [98] | UniProt, ProteinGym | Autoregressive transformer | Protein fitness prediction |
[99] | UniProt, EVmutation | ESM-1b transformer, variational autoencoder and more | Protein fitness prediction |
TMBed [100] | OPM, SignalP 6.0 | ProtT5 + CNN | Prediction of transmembrane classes for each residue |
ReLSO [101] | GIFFORD, GB1, GFP, TAPE | Transformer-based encoder | Designing new protein sequences |
[102] | BIOSNAP, DAVIS, and BindingDB | ProtBERT + ChemBERTa | Prediction of drug-target interactions |
STEP [103] | BIOSNAP [104] | Siamese ProtBERT | Prediction of protein-protein interactions |