From: De-identification of clinical notes with pseudo-labeling using regular expression rules and pre-trained BERT
Training level
DAT
PER
ORG
NUM
LOC
ETC
Small (8,000)
12,618
721
847
17
8
15
Medium (16,000)
25,425
1475
1765
39
25
Large1 (32,000)
50,907
2909
3522
190
79
143
Large2 (31,884)
51,158
2969
1957
215
83
272