Risk-based evaluation of machine learning-based classification methods used for medical devices

Haimerl, Martin; Reich, Christoph

doi:10.1186/s12911-025-02909-9

BMC Medical Informatics and Decision Making

Table 1 Table of standard performance metrics. This list included in [11] describes performance metrics typically used for ML-based classification tasks. Only those metrics are included, which contain no risk-based considerations according to the specification in our paper. It is assumed that the of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) are given. See [11] for more details about the definition and utilization of these metrics

From: Risk-based evaluation of machine learning-based classification methods used for medical devices

General / overarching definitions
Number of actual positive cases: \(\:{P}={T}{P}+{F}{N}\)	Number of actual negative cases: \(\:N=TN+FP\)
Number of predicted positive cases: \(\:{P}{P}={T}{P}+{F}{P}\)	Number of predicted negative cases: \(\:PN=TN+FN\)
Total Population: \(\:{P}{o}{p}={P}+{N}\)	Prevalence: \(\:Prev=\frac{P}{P+N}=\frac{P}{Pop}\)
Metrics documented in the literature research within this study
Sensitivity / Recall / True Positive Rate: \(\:{T}{P}{R}=\frac{{T}{P}}{{P}}\)	Specificity / True Negative Rate: \(\:TPN=\frac{TN}{N}\)
Accuracy: \(\:{A}{c}{c}=\frac{{T}{P}+{T}{N}}{{T}{P}+{F}{P}+{T}{N}+{F}{N}}\) or equivalently Error rate: \(\:{E}{r}{r}=1-{A}{c}{c}\)	Balanced Accuracy, i.e. accuracy after balancing of positive / negative test samples / class members: \(\:BA=\:\frac{TPR+TNR}{2}\)
Precision / Positive Predicted Value: \(\:{P}{P}{V}=\frac{{T}{P}}{{P}{P}}\)	Negative Predictive Value: \(\:NPV=\frac{TN}{PN}\)
\(\varvec{F_1}\)-Score: \(\:{F}1=2\cdot\:\frac{{P}{P}{V}\cdot\:{T}{P}{R}}{{P}{P}{V}+\:{T}{P}{R}}\)	other \(\varvec{F_{\beta\:}}\)-Scores: \(\:F\beta\:=\left(1+{\beta\:}^{2}\right)\cdot\:\frac{PPV\cdot\:TPR}{{\beta\:}^{2}\cdot\:PPV+\:TPR}\)
Matthews Correlation Coefficient: \(\quad\quad MCC=\sqrt{TPR\cdot\:TNR\cdot\:PPV\cdot\:NPV}-\sqrt{\left(1-TPR\right)\cdot\:\left(1-TNR\right)\cdot\:\left(1-PPV\right)\cdot\:\left(1-NPV\right)}\)	Geometric Mean: \(\:MCC=\:\sqrt{TPR\cdot\:TNR}\)
Measures which include not single models (fixed threshold) but multiple variations of thresholds
Receiver Operating Characteristics (**ROC) Curve,** i.e. plot of \(\:{F}{P}{R}\) (on \(\:{x}\) axis) vs. \(\:{T}{P}{R}\) (on y axis).	*Precision-Recall Curve (PRC***), i.e. plot of recall / \(\:TPR\) (on \(\:x\) axis) vs. precision / \(\:PPV\) (on \(\:y\) axis).
Area under the *ROC* Curve: \(\:AUROC=\int_0^1{ROC\left(x\right)\:dx\:}\) as the integral over the function \(\:{R}{O}{C}\left({x}\right)\) described by the \(\:{R}{O}{C}\) Curve	Area under the *PRC* Curve: \(\:AUPRC=\int_0^1PRC\left(x\right)\:dx\) as the integral over the function \(\:PRC\left(x\right)\) described by the \(\:PRC\) Curve
Measures for comparison of two predictions
(Cohen’s) Kappa: \(\:{\kappa\:}=\frac{{{p}}_{0}-{{p}}_{{c}}}{1-{{p}}_{{c}}}\) where \(\:{{p}}_{0}\) is the agreement between the predictions and \(\:{{p}}_{{c}}\) is the agreement with respect to a random prediction	(Cohen’s) Weighted Kappa: (Cohens’s) Kappa \(\:\kappa\:\) with additional weights included, e.g. according to risks or costs

Back to article page

ISSN: 1472-6947

Contact us

General enquiries: journalsubmissions@springernature.com