BMC Medical Informatics and Decision Making

Table 5 The external test set is used to compare the best performing multi-task model, an ensemble of the four best performing single task classifiers (all determined by average performance across all metrics on both internal and external test datasets), and the two expert annotators. Bold values represent which AI model pipeline performed best. All F1 scores are macro averaged

From: Uncertainty-aware automatic TNM staging classification for [¹⁸F] Fluorodeoxyglucose PET-CT reports for lung cancer utilising transformer-based language models and multi-task learning

	ACC_TNMu ↑	ACC_TNM ↑	HL_TNMu ↓	F1_TNMu ↑	F1_T ↑	F1_N ↑	F1_M ↑	F1_u ↑
Multi-task	0.79	0.84	0.07	0.89	0.91	0.95	0.90	0.78
Single task	0.74	0.84	0.08	0.87	0.89	0.95	0.92	0.70
Annotator 1	0.90	0.93	0.04	0.94	0.95	0.99	0.96	0.84
Annotator 2	0.89	0.93	0.04	0.93	0.94	0.99	0.95	0.83

Back to article page

ISSN: 1472-6947

Contact us

General enquiries: journalsubmissions@springernature.com