Bio-ML Track
OAEI 2022::Bio-ML Track
NEWS:
General description
The 2022 edition involves the following ontologies: OMIM (Online Mendelian Inheritance in Man), ORDO (Orphanet Rare Disease Ontology), NCIT (National Cancer Institute Thesaurus) and DOID (Human Disease Ontology), FMA (Foundational Model of Anatyomy) and SNOMED CT.
Specifically, we have the following 5 ontology pairs for matching: OMIM-DOID, NCIT-DOID, SNOMED-FMA (Body), SNOMED-NCIT (Pharm) and SNOMED-NCIT (Neoplas).
Each ontology pair has 4 concrete tasks: unsupervised and semi-supervised equivalence matching, unsupervised and semi-supervised subsumption matching.
We welcome systems to participate either all of them, or only some of them.
Resources
Dataset download: OAEI 2022 Version
Instructions of how to use and evaluate the dataset: Bio-ML Documentation
Resource paper for full details: Revised ArXiv Version
Evaluation
As the first edit of this track, we invite systems to directly upload the mapping results.
Each unsupervised task has a validation mapping set (val.tsv), which can be used for system development, and a testing mapping set;
while each semi-supervised task has an additional training mapping set (train.tsv) which could be used for training for a system with ML modules.
In both kinds of tasks, a system will be evaluated according to the testing set. It should produce and upload a resulting file which includes the testing mappings in test.tsv, and/or another result file which ranks the candidate target classes for each source class in test.cands.tsv.
All the result files of a system should be packaged (.zip or .tar.gz) and submitted via this form.
Please select the included result files, or "None of above" for each ontology pair, in submitting the form.
The deadline for result submission is October 1, 2022.
Submission File Requirements
For each equivalence matching task, please submit one or both of the following two TSV files with header for the unsupervised (or semi-supervised) setting.
A mapping file for global matching evaluation, where each line is a mapping with two classes (IRIs) and a score separated by a tab (see example file).
A mapping ranking file for local ranking evaluation, where each line has an item of the source class, and another item of a ranked list of candidate target classes. Candidate target classes to rank are given in test.cands.tsv. Note not only the classes in the TgtCanddiates column but also the class in the TgtEntity column should be included and ranked.
The ranked classes could have no score (see example file with no score) or be scored (see example file with scores).
Each subsumption matching task only support local ranking evaluation, and thus only the mapping ranking file needs to be packaged and submitted for the unsupervised (or semi-supervised) setting.
Metrics
With the mapping file for global matching, we will calculate Precision, Recall and F1 Score; while with the mapping ranking file for local ranking evaluation, we will calculate Mean Reciprocal Rank (MRR) and Hits@K.
Results
Equivalence Matching
For equivalence matching, we report both the global matching and local ranking results.