From: A study on large-scale disease causality discovery from biomedical literature
Dataset | Feature | Disease entity | Relation pair | Performance |
---|---|---|---|---|
Method in this study | Construct a disease causality semantic predicate list to facilitate the automatic identification of disease causalities | Obtain 14,335 standardized disease entities | Include 6,084 types of bidirectional relations (66,393 SPOs) and 92,557 types of unidirectional relations (17,608 SPOs) | Achieve an accuracy of 96.97% in disease causality extraction |
dRiskKB | 21,354,075 MEDLINE records comprised the text corpus under study, and use disease risk-specific syntactic pattern to automatically extract disease risk pairs | Cover 12,981 diseases | Consist of 34,448 unique disease relation pairs | The identified patterns have an average precision of 0.99, the exactly matched pairs of 0.919 and the partially matched pairs of 0.988 |
A publicly available DDAE dataset extracted from literature [10] | Consisting of 521 PubMed abstracts, containing positive, negative, and null DDAs, and dependency tree-based relation rules and DNorm are used to annotate disease mentions | Contain 12,346 diseases | Consist of 3,322 disease-disease pairs | An annotated DDAE dataset with the final kappa value of 76% |