OAEI 2022::Bio-ML Track

NEWS:

Results available now!

General description

The 2022 edition involves the following ontologies: OMIM (Online Mendelian Inheritance in Man), ORDO (Orphanet Rare Disease Ontology), NCIT (National Cancer Institute Thesaurus) and DOID (Human Disease Ontology), FMA (Foundational Model of Anatyomy) and SNOMED CT.

Specifically, we have the following 5 ontology pairs for matching: OMIM-DOID, NCIT-DOID, SNOMED-FMA (Body), SNOMED-NCIT (Pharm) and SNOMED-NCIT (Neoplas). Each ontology pair has 4 concrete tasks: unsupervised and semi-supervised equivalence matching, unsupervised and semi-supervised subsumption matching. We welcome systems to participate either all of them, or only some of them.

Resources

Dataset download: OAEI 2022 Version
Instructions of how to use and evaluate the dataset: Bio-ML Documentation
Resource paper for full details: Revised ArXiv Version

Evaluation

As the first edit of this track, we invite systems to directly upload the mapping results. Each unsupervised task has a validation mapping set (val.tsv), which can be used for system development, and a testing mapping set; while each semi-supervised task has an additional training mapping set (train.tsv) which could be used for training for a system with ML modules. In both kinds of tasks, a system will be evaluated according to the testing set. It should produce and upload a resulting file which includes the testing mappings in test.tsv, and/or another result file which ranks the candidate target classes for each source class in test.cands.tsv.

All the result files of a system should be packaged (.zip or .tar.gz) and submitted via this form. Please select the included result files, or "None of above" for each ontology pair, in submitting the form. The deadline for result submission is October 1, 2022.

Submission File Requirements

For each equivalence matching task, please submit one or both of the following two TSV files with header for the unsupervised (or semi-supervised) setting.

A mapping file for global matching evaluation, where each line is a mapping with two classes (IRIs) and a score separated by a tab (see example file).

A mapping ranking file for local ranking evaluation, where each line has an item of the source class, and another item of a ranked list of candidate target classes. Candidate target classes to rank are given in test.cands.tsv. Note not only the classes in the TgtCanddiates column but also the class in the TgtEntity column should be included and ranked. The ranked classes could have no score (see example file with no score) or be scored (see example file with scores).

Each subsumption matching task only support local ranking evaluation, and thus only the mapping ranking file needs to be packaged and submitted for the unsupervised (or semi-supervised) setting.

Metrics

With the mapping file for global matching, we will calculate Precision, Recall and F1 Score; while with the mapping ranking file for local ranking evaluation, we will calculate Mean Reciprocal Rank (MRR) and Hits@K.

Results

Equivalence Matching

For equivalence matching, we report both the global matching and local ranking results.

Subsumption Matching

For subsumption matching, we report only the local ranking results as the subsumption mappings are inherently incomplete (see details in our resource paper).

Important Notes for Reproducibility

Result mappings of LogMap, LogMap-Lite, ATMatcher, LSMatch, Matcha come from MELT.
Result mappings of BERTMap, BERTMap-Lite, Word2Vec, OWL2Vec*, BERTSubs come from implementations of the organisers.
Result mappings of AMD and Match-DL come from the participants directly.
All evaluations are conducted with the DeepOnto library.

Organisers

The Bio-ML track is organised by Yuan He, Jiaoyan Chen, Hang Dong, Ernesto Jiménez-Ruiz and Ian Horrocks.