From: Exploring the tradeoff between data privacy and utility with a clinical data analysis use case
Dataset numbers | Re-identifi cation risk -before | Re-identi fication risk -after | Re-identi fication risk reduction rate | ARX utility score | EMD | # of records retained for logistic regression | # of predictors retained for logistic regression | Dataset retention ratio |
---|---|---|---|---|---|---|---|---|
1 | 0.993 | 0.064 | 0.936 | 0.722 | 62.346 | 547 | 11 | 0.401 |
2 | 0.993 | 0.076 | 0.924 | 0.807 | 62.559 | 396 | 11 | 0.290 |
3 | 0.993 | 0.064 | 0.936 | 0.722 | 62.346 | 547 | 11 | 0.401 |
4 | 0.993 | 0.076 | 0.924 | 0.807 | 62.559 | 396 | 11 | 0.290 |
5 | 0.908 | 0.044 | 0.952 | 0.485 | 61.746 | 954 | 12 | 0.762 |
6 | 0.908 | 0.059 | 0.935 | 0.599 | 62.017 | 765 | 12 | 0.611 |
7 | 0.908 | 0.000 | 1.000 | 1.000 | 61.118 | 1119 | 7 | 0.522 |
8 | 0.908 | 0.000 | 1.000 | 1.000 | 61.118 | 1119 | 7 | 0.522 |
9 | 0.963 | 0.059 | 0.939 | 0.500 | 61.623 | 910 | 12 | 0.727 |
10 | 0.963 | 0.085 | 0.911 | 0.600 | 61.945 | 756 | 12 | 0.604 |
11 | 0.963 | 0.002 | 0.998 | 0.890 | 62.542 | 1155 | 9 | 0.692 |
12 | 0.963 | 0.002 | 0.998 | 0.846 | 62.737 | 1155 | 9 | 0.692 |
13 | 0.135 | 0.014 | 0.897 | 0.449 | 61.414 | 1113 | 13 | 0.964 |
14 | 0.135 | 0.002 | 0.986 | 0.654 | 61.521 | 1052 | 12 | 0.841 |
15 | 0.135 | 0.014 | 0.897 | 0.449 | 61.414 | 1113 | 13 | 0.964 |
16 | 0.135 | 0.014 | 0.897 | 0.449 | 61.414 | 1113 | 13 | 0.964 |
17 | 0.965 | 0.064 | 0.934 | 0.749 | 63.512 | 547 | 11 | 0.401 |
18 | 0.991 | 0.076 | 0.924 | 0.749 | 62.558 | 396 | 11 | 0.290 |
19 | 0.943 | 0.064 | 0.932 | 0.639 | 63.498 | 547 | 11 | 0.401 |