Skip to main content

Table 2 Results from literature search. Table of articles which were included in the literature research regarding recent publications about performance metrics of ML-based classification models (sorted according to the “most recent” criterion). The table documents the used performance metric as well as the rating regarding the inclusion of risk-based considerations according to the specification in Research Question A – Utilization of risk-based performance metrics in recent scientific publications

From: Risk-based evaluation of machine learning-based classification methods used for medical devices

First author + ref no.

Used performance metrics

Resulting category (as described in Research Question A – Utilization of risk-based performance metrics in recent scientific publications)

Ozcan [30]

Acc, Sen, Prec

Additional metrics (without direct risk integration): Determinism → was neither described nor referenced reliably

noRC / noRP

Garavand [31]

Acc, Prec, Sens, Spec, F1 Score, ROC, AUROC, AUPRC

noRC / noRP

ElSeddawy [32]

Acc, Sens, Spec, F1 Score, G-mean, ROC, AUROC, (unweighted) Kappa

noRC / noRP

Kasim [33]

Acc, Prec, NPV, Sen, Spec, AUROC, (unweighted) Kappa

Additional metrics (without direct risk integration): net reclassification index (NRI)

noRC / RP

In this case, the basic application (mortality prediction) was strongly related to a risk-based application itself. Thus, also the evaluation included risk factors, in some sense, even though standardized metrics were used. The effect, which were caused by errors in the ML systems itself, were not included additionally.

Farhang-Sardroodi [34]

ROC, AUROC

noRC / noRP

Wu [29]

Acc, Prec, Sen, F1-Score, ROC, AUROC

noRC / noRP

Preto [35]

Acc, Prec, Sen, F1-Score, AUROC

noRC / noRP

González-Cebrián [36]

Acc, Sen, Spec, F1-Score, MCC, AUROC

noRC / RP

In this case, the basic application (mortality prediction) was strongly related to a risk-based application itself. Thus, also the evaluation included risk factors, in some sense, even though standardized metrics were used. The effect, which were caused by errors in the ML systems itself, were not included additionally.

He [37]

Acc, Prec, Sen, F1-Score, ROC, AUROC

noRC / noRP

Milara [38]

Acc, Prec, Sen, Spec, F1-Score, AUROC

noRC / noRP

Emakhu [39]

Acc, Prec, Sen, Spec, MCC, F1 score, ROC, AUROC

RC / RP

In this case, the basic application (Acute coronary syndrome prediction) was related to a risk-based application itself. Additionally, there was a cost-sensitive approach included in the evaluation of the models, besides the utilization of standardized metrics.

Haq [40]

Acc, Prec, NPV, Sen, Spec, ROC,

Additional metrics (without direct risk integration): Dice Similarity Coefficient (DSC), Probabilistic Random Index (PRI).

noRC / noRP

Movahed [41]

Acc, Sen, Spec, F1-Score, ROC, AUROC

Additional metrics (without direct risk integration): False Discovery Rate

noRC / noRP

Templeton [42]

Acc, Prec, Sen

noRC / noRP

Zou [43]

Acc, BA, Prec, Sen, Spec, F1-Score, MCC, ROC, AUROC

noRC / noRP

Tran [44]

Acc, F1-Score, ROC, AUROC

noRC / noRP

Maskew [45]

Acc, PPV, NPV, ROC, AUROC

noRC / noRP

Mabrouk [46]

Acc, BA, Prec, Sens, F1 score

noRC / noRP

Khan [47]

Acc, Prec, Sens, F1 score

noRC / noRP

Ho [48]

Acc, Prec, Sens, F1 score

noRC / noRP

Eissa [49]

Acc, Prec, Sens, MCC, F1 Score, ROC, AUROC

noRC / noRP

Salimpour [50]

Acc, Prec, Sens, (unweighted) Kappa

noRC / noRP

Berenguer-Vidal [51]

Acc, Prec, Sen, Spec

noRC / noRP

Dritsas [52]

Acc, Prec, Sens, F1 Score, AUROC

noRC / noRP

Ahmad [53]

Acc, Prec, Sen, Spec, ROC

noRC / noRP

Goñi [54]

BA, Prec, NPV, Sens, Spec, ROC, AUROC

noRC / noRP

Dubol [55]

Acc, AUROC

noRC / noRP

Hidayat [56]

Acc, Sen, Spec, ROC, AUROC

noRC / noRP

Baskozos [57]

BA, MCC, AUPRC

noRC / noRP

Shakhovska [58]

Acc, Prec, Sens, F1 Score, AUROC

noRC / noRP