A study on large-scale disease causality discovery from biomedical literature

Table 11 Comparison with existing disease-disease association extraction datasets

Dataset	Feature	Disease entity	Relation pair	Performance
Method in this study	Construct a disease causality semantic predicate list to facilitate the automatic identification of disease causalities	Obtain 14,335 standardized disease entities	Include 6,084 types of bidirectional relations (66,393 SPOs) and 92,557 types of unidirectional relations (17,608 SPOs)	Achieve an accuracy of 96.97% in disease causality extraction
dRiskKB	21,354,075 MEDLINE records comprised the text corpus under study, and use disease risk-specific syntactic pattern to automatically extract disease risk pairs	Cover 12,981 diseases	Consist of 34,448 unique disease relation pairs	The identified patterns have an average precision of 0.99, the exactly matched pairs of 0.919 and the partially matched pairs of 0.988
A publicly available DDAE dataset extracted from literature [10]	Consisting of 521 PubMed abstracts, containing positive, negative, and null DDAs, and dependency tree-based relation rules and DNorm are used to annotate disease mentions	Contain 12,346 diseases	Consist of 3,322 disease-disease pairs	An annotated DDAE dataset with the final kappa value of 76%

ISSN: 1472-6947