Skip to main content

Table 4 The results of the developed Machine Learning (ML) models are presented along with their respective 95% confidence intervals (C.I.)

From: Second opinion machine learning for fast-track pathway assignment in hip and knee replacement surgery: the use of patient-reported outcome measures

 

HST

FIGS

LR

SVM

RF

XGB

DT

MLP

Accuracy

0.693 ± 0.06

0.814 \(\varvec{\pm }\) 0.025

0.751 \(\varvec{\pm }\) 0.056

0.742 \(\varvec{\pm }\) 0.057

0.778 \(\varvec{\pm }\) 0.054

0.76 \(\varvec{\pm }\) 0.056

0.667 ± 0.062

0.773 \(\varvec{\pm }\) 0.055

Sensitivity

0.61 ± 0.075

0.847 ± 0.028

0.744 ± 0.067

0.756 ± 0.066

0.799 ± 0.061

0.97 \(\varvec{\pm }\) 0.026

0.659 ± 0.073

0.799 ± 0.061

Specificity

0.918 \(\varvec{\pm }\) 0.069

0.725 ± 0.056

0.77 \(\varvec{\pm }\) 0.106

0.705 ± 0.114

0.721 ± 0.113

0.197 ± 0.1

0.689 ± 0.116

0.705 ± 0.114

PPV

0.952 \(\varvec{\pm }\) 0.041

0.892 \(\varvec{\pm }\) 0.024

0.897 \(\varvec{\pm }\) 0.051

0.873 \(\varvec{\pm }\) 0.055

0.885 \(\varvec{\pm }\) 0.051

0.764 ± 0.058

0.85 \(\varvec{\pm }\) 0.062

0.879 \(\varvec{\pm }\) 0.052

NPV

0.467 \(\varvec{\pm }\) 0.089

0.639 \(\varvec{\pm }\) 0.057

0.528 \(\varvec{\pm }\) 0.104

0.518 \(\varvec{\pm }\) 0.107

0.571 \(\varvec{\pm }\) 0.111

0.706 \(\varvec{\pm }\) 0.217

0.429 \(\varvec{\pm }\) 0.098

0.566 \(\varvec{\pm }\) 0.111

AUC

0.804 ± 0.002

0.852 \(\varvec{\pm }\) 0.001

0.831 ± 0.002

0.805 ± 0.002

0.848 ± 0.002

0.824 ± 0.002

0.769 ± 0.002

0.82 ± 0.002

F1

0.743 ± 0.058

0.869 \(\varvec{\pm }\) 0.02

0.813 \(\varvec{\pm }\) 0.048

0.81 \(\varvec{\pm }\) 0.048

0.84 \(\varvec{\pm }\) 0.044

0.855 \(\varvec{\pm }\) 0.038

0.742 ± 0.056

0.837 \(\varvec{\pm }\) 0.044

Brier

0.183 \(\varvec{\pm }\) 0.053

0.173 \(\varvec{\pm }\) 0.024

0.183 \(\varvec{\pm }\) 0.037

0.152 \(\varvec{\pm }\) 0.08

0.17 \(\varvec{\pm }\) 0.028

0.213 \(\varvec{\pm }\) 0.283

0.193 \(\varvec{\pm }\) 0.083

0.163 \(\varvec{\pm }\) 0.083

Bal. Acc

0.764 ± 0.002

0.786 \(\varvec{\pm }\) 0.001

0.757 ± 0.002

0.731 ± 0.002

0.76 ± 0.002

0.583 ± 0.001

0.674 ± 0.002

0.752 ± 0.002

MCC

0.47 \(\varvec{\pm }\) 0.128

0.552 \(\varvec{\pm }\) 0.064

0.468 \(\varvec{\pm }\) 0.128

0.425 \(\varvec{\pm }\) 0.128

0.487 \(\varvec{\pm }\) 0.128

0.28 ± 0.128

0.311 ± 0.128

0.473 \(\varvec{\pm }\) 0.128

sNB

0.579 ± 0.003

0.745 \(\varvec{\pm }\) 0.001

0.659 ± 0.003

0.646 ± 0.004

0.695 ± 0.003

0.671 ± 0.005

0.543 ± 0.004

0.689 ± 0.003

HC Acc.

0.741 \(\varvec{\pm }\) 0.057

0.807 \(\varvec{\pm }\) 0.026

0.789 \(\varvec{\pm }\) 0.053

0.763 \(\varvec{\pm }\) 0.056

0.807 \(\varvec{\pm }\) 0.052

0.781 \(\varvec{\pm }\) 0.054

0.768 \(\varvec{\pm }\) 0.055

0.781 \(\varvec{\pm }\) 0.054

HC Sens.

0.67 ± 0.072

0.759 ± 0.033

0.72 ± 0.069

0.697 ± 0.07

0.727 ± 0.068

0.939 \(\varvec{\pm }\) 0.037

0.729 ± 0.068

0.705 ± 0.07

HC Spec.

0.907 \(\varvec{\pm }\) 0.073

0.908 \(\varvec{\pm }\) 0.036

0.923 \(\varvec{\pm }\) 0.067

0.895 \(\varvec{\pm }\) 0.077

0.973 \(\varvec{\pm }\) 0.041

0.375 ± 0.121

0.886 \(\varvec{\pm }\) 0.08

0.944 \(\varvec{\pm }\) 0.057

HC PPV

0.944 \(\varvec{\pm }\) 0.044

0.946 \(\varvec{\pm }\) 0.018

0.947 \(\varvec{\pm }\) 0.038

0.93 \(\varvec{\pm }\) 0.042

0.982 \(\varvec{\pm }\) 0.021

0.794 ± 0.055

0.951 \(\varvec{\pm }\) 0.037

0.965 \(\varvec{\pm }\) 0.03

HC NPV

0.542 \(\varvec{\pm }\) 0.089

0.639 \(\varvec{\pm }\) 0.057

0.632 \(\varvec{\pm }\) 0.1

0.596 \(\varvec{\pm }\) 0.106

0.632 \(\varvec{\pm }\) 0.108

0.706 \(\varvec{\pm }\) 0.217

0.517 \(\varvec{\pm }\) 0.099

0.596 \(\varvec{\pm }\) 0.11

Coverage

0.636 \(\varvec{\pm }\) 0.063

0.679 \(\varvec{\pm }\) 0.031

0.507 ± 0.065

0.507 ± 0.065

0.507 ± 0.065

0.507 ± 0.065

0.631 \(\varvec{\pm }\) 0.063

0.507 ± 0.065

  1. These confidence intervals for key metrics such as accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) are calculated based on the variance formula applicable to binomial distributions. In particular, C.I. for the AUC and sNB were computed according to the formulas described in [36]; C.I. for the balanced accuracy were computed according to the formula described in [37]; C.I. for the Brier score were computed according to the formula described in [38]; while C.I. for the MCC were computed by applying Hoeffding’s inequality. For each metric, values in bold denote values that were not significantly worse than the top-ranked one, as measured by overlap of the 95% C.I