Fig. 3
From: Collaborative learning from distributed data with differentially private synthetic data

Usefulness of sharing of data using synthetic data sets is retained in the small data regime: Performance of a model trained including shared synthetic data from other parties (blue), all with similarly small local data, decreases much less than that of a model trained only on locally available data (orange). Higher mean log-likelihoods of combined over local only are statistically highly significant (\(p < 0.001, n_{\text {local only}}={1\,000}, n_{\text {combined}}={100\,000}\)) for all data set sizes