Skip to main content

Table 2 Performance of different components of de-identification method when applied separately on Yellow Card test data

From: Automated redaction of names in adverse event reports using transformer-based neural networks

 

Performance in YC test data:

Tokens in YC test data:

Token length

Component

Precision

Recall

F1

False positive rate

NAMES

NON-NAMES

All

BERT

55%

87%

67%

0.05%

179

263,272

BERT + rules

26%

88%

40%

0.17%

179

263,272

Rules alone

13%

26%

17%

0.12%

179

263,272

Long (> 3)

BERT

58%

94%

72%

0.04%

108

162,582

BERT + rules

32%

96%

48%

0.14%

108

162,582

Rules alone

20%

31%

24%

0.08%

108

162,582

Short (≤ 3)

BERT

50%

75%

60%

0.05%

71

100,690

BERT + rules

19%

75%

30%

0.23%

71

100,690

Rules alone

6%

17%

9%

0.17%

71

100,690

  1. bold: best score, YC: Yellow Card