|
Roadmap for refactoring corpora. The list of corpora came from [32] and [33], where there are links to the corpora. Column headings indicate the steps that corpora may need to undergo to be refactored; those corpora that would require that step are noted with a dot. The heading "get original" means the original text needs to be retrieved. "Detect spans" means the corpus is a metadata corpus so spans of entities need to be detected. "Alt. search" means techniques other than exact-match searching must be used. |
|||
| get original |
detect spans |
alt. search |
|
|
|
|||
| Arabidopsis Thaliana Circadian Rhythms [34] |
• |
||
| Bio1 [35] |
• |
||
| BioCreative 2004 Task 1A [28] |
• |
• |
|
| BioCreative 2004 Task 1B [36] |
• |
• |
|
| BioCreative 2004 Task 2 [37] |
• |
• |
|
| BioCreative 2006 Task GM [38] |
|||
| BioCreative 2006 Task GN [39] |
|||
| BioCreative 2006 Task IPS/IMS [40] |
• |
• |
|
| BioCreative 2006 Task ISS [40] |
• |
||
| BioInfer [41] |
|||
| BioText: Recognizing Abbreviation Defintions [42] |
|||
| BioText: Protein-Protein Interaction Data [43] |
• |
• |
|
| BioText: Relations between Disease/Treatment Entities [44] |
• |
||
| Brown-Genia Treebank [45] |
• |
||
| DepGenia [46] |
• |
||
| DIPPPI [47] |
• |
• |
|
| EDGAR [48] |
• |
• |
|
| GENIA [49, 50] |
• |
||
| FetchProt [51] |
|||
| Human Gene ID-Serve |
• |
||
| IEPA [52] |
• |
• |
|
| ImmunoTome |
• |
||
| iProLink [53] |
|||
| Medstract [54, 55] |
|||
| MedTag [7] |
|||
| OHSUMED [56, 57] |
• |
• |
• |
| PASBio [58] |
• |
||
| PASTA [59] |
|||
| PathBinder [60] |
|||
| PennBioIE [12] |
|||
| PICorpus |
|||
| ProSpecTome [61] |
• |
• |
|
| PDG [9] |
• |
• |
• |
| Texas [62] |
• |
• |
|
| TREC Genomics 2004 Categorization Task [63] |
• |
• |
|
| TREC Genomics 2005 Categorization Task [64] |
• |
• |
|
| TREC Gemonics 2006 IR Task [65] |
• |
• |
|
| TREC Genomics 2007 IR Task [65] |
• |
• |
|
| Wisconsin [66] |
• |
• |
• |
| WSD [67] |
|||
| Yapex [68, 69] |
• |
||
Johnson et al. Journal of Biomedical Discovery and Collaboration 2007 2:4 doi:10.1186/1747-5333-2-4 |
|||