From: Automated redaction of names in adverse event reports using transformer-based neural networks
Performance in YC test data: | Tokens in YC test data: | ||||||
---|---|---|---|---|---|---|---|
Token length | Component | Precision | Recall | F1 | False positive rate | NAMES | NON-NAMES |
All | BERT | 55% | 87% | 67% | 0.05% | 179 | 263,272 |
BERT + rules | 26% | 88% | 40% | 0.17% | 179 | 263,272 | |
Rules alone | 13% | 26% | 17% | 0.12% | 179 | 263,272 | |
Long (> 3) | BERT | 58% | 94% | 72% | 0.04% | 108 | 162,582 |
BERT + rules | 32% | 96% | 48% | 0.14% | 108 | 162,582 | |
Rules alone | 20% | 31% | 24% | 0.08% | 108 | 162,582 | |
Short (≤ 3) | BERT | 50% | 75% | 60% | 0.05% | 71 | 100,690 |
BERT + rules | 19% | 75% | 30% | 0.23% | 71 | 100,690 | |
Rules alone | 6% | 17% | 9% | 0.17% | 71 | 100,690 |