Skip to main content

Table 2 The number of PHI words for training data

From: De-identification of clinical notes with pseudo-labeling using regular expression rules and pre-trained BERT

Training level

DAT

PER

ORG

NUM

LOC

ETC

Small (8,000)

12,618

721

847

17

8

15

Medium (16,000)

25,425

1475

1765

39

15

25

Large1 (32,000)

50,907

2909

3522

190

79

143

Large2 (31,884)

51,158

2969

1957

215

83

272