Skip to main content

Table 1 Study population of FHS, ADNI, and WDBC datasets and their characteristics

From: DREAMER: a computational framework to evaluate readiness of datasets for machine learning

Dataset

FHS (n = 5209)

ADNI (n = 2376)

WDBC (n = 569)

Classification feature

Probable dementia present

Dementia diagnosis at baseline

Breast cancer diagnosis

Labels characteristic

No Dementia

(n = 1245)

Probable Dementia

(n = 1220)

CN

(n = 534)

LMCI

(n = 672)

EMCI

(n = 411)

SMC

(n = 325)

AD

(n = 407)

Benign (not cancerous)

(n = 357)

Malignant (cancerous)

(n = 212)

Age

44.3 ± 8.3

[29, 62]

43 ± 8

[29, 62]

73.4 ± 6.2

[55, 89]

73.7 ± 7.5

[54, 91]

71.2 ± 7.4

[55, 89]

71 ± 6.4

[56, 90]

74.8 ± 7.9

[55, 90]

 

Gender, male (%)

607 (48.7%)

384 (31.4%)

252 (47.1%)

411 (61.1%)

227 (55.2%)

126 (38.7%)

230 (56.5%)

Education

4.9 ± 2.9

[0, 10]

5 ± 2.3

[0, 10]

16.4 ± 2.6

[6, 20]

15.9 ± 2.8

[4, 20]

16 ± 2.6

[10, 20]

16.7 ± 2.3

[8, 20]

15.1 ± 2.9

[4, 20]

Data quality scores

PC = 0.7674

Spearman correlation = 0.7363

Missing values = 0.2689

Outliers = 1.0

Class overlap = 0.8071

Total weighted quality = 0.6481

PC = 0.7466

Spearman correlation = 0.7312

Missing values = 0.7497

Outliers = 0.1347

Class overlap = 0.1654

Total weighted quality = 0.4656

PC = 0.6052

Spearman correlation = 0.5782

Missing values = 1

Outliers = 0.0875

Class overlap = 0.9315

Total weighted quality = 0.6335

Classification / Clustering accuracy

Classification accuracy = 0.8594

Clustering accuracy = 0.4332

Classification accuracy = 0.5134

Clustering accuracy = 0.6012

Classification accuracy = 0.4804

Clustering accuracy = 0.5792

Number of features

81

45

30