Skip to main content

Table 5 Performance evaluation metrics

From: Improved liver disease prediction from clinical data through an evaluation of ensemble learning approaches

Metrics

Calculation

Description

Accuracy

\(\frac{TP + TN}{TP + TN + FP + FN}\)

The number of instances when both LD and NLD are correctly predicted out of the total prediction by the model. A higher accuracy suggests that the model is better at correctly classifying both individuals with LD and those without LD.

Precision

\(\frac{TP}{TP + FP}\)

The number of instances when the patient actually has LD out of the total true LD and false LD prediction made by the model. When the precision is higher, the model is more reliable in identifying individuals with LD, and there are fewer cases where individuals are incorrectly classified as positive when they are actually negative.

Recall/TPR

\(\frac{TP}{TP + FN}\)

The number of instances when the patient actually has LD out of the total instances predicted by the model for true LD and false NLD. A higher recall suggests that the model is better at capturing cases of LD, meaning it is less likely to miss individuals who are actually suffering from the condition.

F1-score

\(\frac{2\times TP}{2\times TP + FP+ FN}\)

The harmonic mean of the recall and precision. A higher F1-score suggests the model has a better balance between precision and recall, meaning it is better at correctly identifying both the positive and negative instances of LD.

Specificity

\(\frac{TN}{TN + FP}\)

The number of instances when the patient actually does not have LD out of the total instances predicted by the model for true NLD and false LD. A higher specificity suggests that the model is better at avoiding false alarms of LD.

Macro average (MA)

\(\frac{1}{4}\sum\limits _{c=0}^{3}{A}_{c}^{m}\)

The arithmetic mean of the individual class for precision, recall, and f1-score, where c denotes classes 0 to 3 and m denotes either precision or recall or F1-score.

Weighted average (WA)

\(\sum\limits _{c=0}^{3}{w}_{c}^{m}\times \frac{1}{4}\sum\limits _{c=0}^{3}{A}_{c}^{m}\)

The arithmetic mean of the individual class multiplied by respective weights for precision, recall, and F1-score, where \({w}_{0}+{w}_{1}+{w}_{2}+{w}_{3}=1\).

Negative predicted values

(NPV)

\(\frac{TN}{TN + FN}\)

The number of instances when the patient actually does not have LD out of the total true NLD and false NLD prediction made by the model. A higher NPV implies that the model is better at ruling out LD in individuals who are actually disease-free. This indicates a higher confidence level in the model’s ability to accurately identify individuals who do not have LD, reducing the likelihood of missed diagnoses and ensuring that fewer individuals are mistakenly classified as healthy when needing medical attention.

Matthews corelation coefficient (MCC)

\(\frac{TP \times TN-FP\times FN}{\sqrt{\begin{array}{c}\left(TP+FP\right)\times \left(TP+FN\right)\times \\ (TN+FP)\times (TN+FN)\end{array}}}\)

Indicates a balanced performance of the model in predicting both LD and NLD. A higher MCC suggests that the model’s predictions are more consistent with the true labels, and there is a stronger agreement between the model’s predictions and the actual outcomes.

False-positive rate (FPR)

\(\frac{FP}{FP + TN}\)

The number of instances when the model falsely predicts LD out of the total instances predicted by the model for false LD and true NLD. A lower FPR in LD prediction indicates that the model has a better ability to correctly identify individuals without LD, reducing the likelihood of false alarms and improving the overall accuracy of the diagnostic process. Reducing the FPR is crucial in medical diagnosis because it helps minimize unnecessary stress, follow-up tests, and treatments for individuals who are actually disease-free.

False-negative rate (FNR)

\(\frac{FN}{TP + FN}\)

The number of instances when the model falsely predicts NLD out of the total instances predicted by the model for true LD and false NLD. A lower FNR in LD prediction indicates that the model has a better ability to correctly identify individuals with liver disease, reducing the likelihood of missed diagnoses. Reducing the FNR is crucial in medical diagnosis because it helps ensure that individuals who have LD are correctly identified and receive timely treatment.

False discovery rate (FDR)

\(\frac{FP}{FP + TP}\)

The number of instances when the model falsely predicts LD out of the total instances predicted by the model for false LD and true LD. When the FDR is lower, fewer individuals are incorrectly classified as having LD when they are actually healthy. Lowering the FDR is crucial in medical diagnosis because it helps reduce unnecessary stress, follow-up tests, and treatments for individuals who are actually disease-free. By minimizing FP predictions, the model becomes more reliable in identifying true cases of LD.

Misclassification rate (MCR)

\(\frac{FP + FN}{TP + TN + FP + FN}\)

The number of instances when both LD and NLD are incorrectly predicted out of the total prediction by the model. A lower MCR indicates that the model is performing well in accurately identifying cases of LD while minimizing incorrect classifications. It reflects a higher level of effectiveness and reliability in the diagnostic process.

Runtime

-

Amount of time (in minutes) required to execute the algorithm.