Risk-based evaluation of machine learning-based classification methods used for medical devices

Haimerl, Martin; Reich, Christoph

doi:10.1186/s12911-025-02909-9

BMC Medical Informatics and Decision Making

Table 2 Results from literature search. Table of articles which were included in the literature research regarding recent publications about performance metrics of ML-based classification models (sorted according to the “most recent” criterion). The table documents the used performance metric as well as the rating regarding the inclusion of risk-based considerations according to the specification in Research Question A – Utilization of risk-based performance metrics in recent scientific publications

From: Risk-based evaluation of machine learning-based classification methods used for medical devices

First author + ref no.	Used performance metrics	Resulting category (as described in Research Question A – Utilization of risk-based performance metrics in recent scientific publications)
Ozcan [30]	Acc, Sen, Prec Additional metrics (without direct risk integration): Determinism → was neither described nor referenced reliably	noRC / noRP
Garavand [31]	Acc, Prec, Sens, Spec, F1 Score, ROC, AUROC, AUPRC	noRC / noRP
ElSeddawy [32]	Acc, Sens, Spec, F1 Score, G-mean, ROC, AUROC, (unweighted) Kappa	noRC / noRP
Kasim [33]	Acc, Prec, NPV, Sen, Spec, AUROC, (unweighted) Kappa Additional metrics (without direct risk integration): net reclassification index (NRI)	noRC / RP In this case, the basic application (mortality prediction) was strongly related to a risk-based application itself. Thus, also the evaluation included risk factors, in some sense, even though standardized metrics were used. The effect, which were caused by errors in the ML systems itself, were not included additionally.
Farhang-Sardroodi [34]	ROC, AUROC	noRC / noRP
Wu [29]	Acc, Prec, Sen, F1-Score, ROC, AUROC	noRC / noRP
Preto [35]	Acc, Prec, Sen, F1-Score, AUROC	noRC / noRP
González-Cebrián [36]	Acc, Sen, Spec, F1-Score, MCC, AUROC	noRC / RP In this case, the basic application (mortality prediction) was strongly related to a risk-based application itself. Thus, also the evaluation included risk factors, in some sense, even though standardized metrics were used. The effect, which were caused by errors in the ML systems itself, were not included additionally.
He [37]	Acc, Prec, Sen, F1-Score, ROC, AUROC	noRC / noRP
Milara [38]	Acc, Prec, Sen, Spec, F1-Score, AUROC	noRC / noRP
Emakhu [39]	Acc, Prec, Sen, Spec, MCC, F1 score, ROC, AUROC	RC / RP In this case, the basic application (Acute coronary syndrome prediction) was related to a risk-based application itself. Additionally, there was a cost-sensitive approach included in the evaluation of the models, besides the utilization of standardized metrics.
Haq [40]	Acc, Prec, NPV, Sen, Spec, ROC, Additional metrics (without direct risk integration): Dice Similarity Coefficient (DSC), Probabilistic Random Index (PRI).	noRC / noRP
Movahed [41]	Acc, Sen, Spec, F1-Score, ROC, AUROC Additional metrics (without direct risk integration): False Discovery Rate	noRC / noRP
Templeton [42]	Acc, Prec, Sen	noRC / noRP
Zou [43]	Acc, BA, Prec, Sen, Spec, F1-Score, MCC, ROC, AUROC	noRC / noRP
Tran [44]	Acc, F1-Score, ROC, AUROC	noRC / noRP
Maskew [45]	Acc, PPV, NPV, ROC, AUROC	noRC / noRP
Mabrouk [46]	Acc, BA, Prec, Sens, F1 score	noRC / noRP
Khan [47]	Acc, Prec, Sens, F1 score	noRC / noRP
Ho [48]	Acc, Prec, Sens, F1 score	noRC / noRP
Eissa [49]	Acc, Prec, Sens, MCC, F1 Score, ROC, AUROC	noRC / noRP
Salimpour [50]	Acc, Prec, Sens, (unweighted) Kappa	noRC / noRP
Berenguer-Vidal [51]	Acc, Prec, Sen, Spec	noRC / noRP
Dritsas [52]	Acc, Prec, Sens, F1 Score, AUROC	noRC / noRP
Ahmad [53]	Acc, Prec, Sen, Spec, ROC	noRC / noRP
Goñi [54]	BA, Prec, NPV, Sens, Spec, ROC, AUROC	noRC / noRP
Dubol [55]	Acc, AUROC	noRC / noRP
Hidayat [56]	Acc, Sen, Spec, ROC, AUROC	noRC / noRP
Baskozos [57]	BA, MCC, AUPRC	noRC / noRP
Shakhovska [58]	Acc, Prec, Sens, F1 Score, AUROC	noRC / noRP

Back to article page

ISSN: 1472-6947

Contact us

General enquiries: journalsubmissions@springernature.com