Skip to main content
Fig. 1 | BMC Medical Informatics and Decision Making

Fig. 1

From: DREAMER: a computational framework to evaluate readiness of datasets for machine learning

Fig. 1

DREAMER framework. a The DREAMER architecture workflow delineates the process for evaluating the readiness of a tabular dataset for machine learning. Input to DREAMER comprises the tabular dataset under scrutiny, which undergoes a sequence of automated procedures, culminating in the generation of a structured tabular dataset conducive to machine learning analysis. b The transformation of the data space D into data readiness space D’ involves constructing a new dataset from the master dataset. The master dataset dimension is denoted as N×M, while the data readiness dataset assumes dimensions of d×k, where d represents the number of random sub-tables and k indicates the number of data quality measures. c The process involves learning the weights of data quality measures from dataset D’ utilizing regression methodology. The average accuracy of clustering and classification serves as the target value for the regression algorithm. Subsequently, weighted total quality of sub-tables is computed post-weight learning to ascertain the best sub-table boasting the highest data quality. d The search space of DREAMER scales proportionally with the size of the master dataset (both in terms of rows and columns). We execute DREAMER R times to identify the best sub-table of each run as local maximum, subsequently selecting the sub-table exhibiting the highest data quality as a potential global maximum

Back to article page