Table 4

Data file contents and counts for annotation hierarchy subtasks.

File contents
Training data count
Test data count
Documents – PMIDs
504
378

Genes – Gene symbol, MGI identifier, and gene name for all used
1294
777
Document gene pairs – PMID-gene pairs
1418
877
Positive examples – PMIDs
178
149
Positive examples – PMID-gene pairs
346
295
Positive examples – PMID-gene-domain tuples
589
495
Positive examples – PMID-gene-domain-evidence tuples
640
522
Positive examples – all PMID-gene-GO-evidence tuples
872
693
Negative examples – PMIDs
326
229
Negative examples – PMID-gene pairs
1072
582

Cohen and Hersh Journal of Biomedical Discovery and Collaboration 2006 1:4   doi:10.1186/1747-5333-1-4