Skip to main content
Fig. 4 | BMC Medical Informatics and Decision Making

Fig. 4

From: Collaborative learning from distributed data with differentially private synthetic data

Fig. 4

Training the analysis model on only Newcastle’s local data (full data without subsampling) results in poor predictive performance on global data due to skew in the local distribution (orange). Not considering the ethnicity feature when training on local data improves predictive performance (green). Combining local data with synthetic data also improves model performance while still considering all features, i.e., without need to change the model (blue). The dashed line indicates the log-likelihood of a model trained on the full population. Ten independent repeats. Observed pairwise differences between the means of the distributions are statistically highly significant (\(p < 0.001, n_{\text {local only}}={1\,000}, n_{\text {combined}} = {100\,000}\))

Back to article page