From: DREAMER: a computational framework to evaluate readiness of datasets for machine learning
Dataset | FHS (n = 5209) | ADNI (n = 2376) | WDBC (n = 569) | ||||||
---|---|---|---|---|---|---|---|---|---|
Classification feature | Probable dementia present | Dementia diagnosis at baseline | Breast cancer diagnosis | ||||||
Labels characteristic | No Dementia (n = 1245) | Probable Dementia (n = 1220) | CN (n = 534) | LMCI (n = 672) | EMCI (n = 411) | SMC (n = 325) | AD (n = 407) | Benign (not cancerous) (n = 357) | Malignant (cancerous) (n = 212) |
Age | 44.3 ± 8.3 [29, 62] | 43 ± 8 [29, 62] | 73.4 ± 6.2 [55, 89] | 73.7 ± 7.5 [54, 91] | 71.2 ± 7.4 [55, 89] | 71 ± 6.4 [56, 90] | 74.8 ± 7.9 [55, 90] | ||
Gender, male (%) | 607 (48.7%) | 384 (31.4%) | 252 (47.1%) | 411 (61.1%) | 227 (55.2%) | 126 (38.7%) | 230 (56.5%) | ||
Education | 4.9 ± 2.9 [0, 10] | 5 ± 2.3 [0, 10] | 16.4 ± 2.6 [6, 20] | 15.9 ± 2.8 [4, 20] | 16 ± 2.6 [10, 20] | 16.7 ± 2.3 [8, 20] | 15.1 ± 2.9 [4, 20] | ||
Data quality scores | PC = 0.7674 Spearman correlation = 0.7363 Missing values = 0.2689 Outliers = 1.0 Class overlap = 0.8071 Total weighted quality = 0.6481 | PC = 0.7466 Spearman correlation = 0.7312 Missing values = 0.7497 Outliers = 0.1347 Class overlap = 0.1654 Total weighted quality = 0.4656 | PC = 0.6052 Spearman correlation = 0.5782 Missing values = 1 Outliers = 0.0875 Class overlap = 0.9315 Total weighted quality = 0.6335 | ||||||
Classification / Clustering accuracy | Classification accuracy = 0.8594 Clustering accuracy = 0.4332 | Classification accuracy = 0.5134 Clustering accuracy = 0.6012 | Classification accuracy = 0.4804 Clustering accuracy = 0.5792 | ||||||
Number of features | 81 | 45 | 30 |