Skip to content

OAEI Utilities

This page concerns utility functions used in the OAEI.

get_ignored_class_index(onto)

Get an index for filtering classes that are marked as not used in alignment.

This is indicated by the special class annotation use_in_alignment with the following IRI: http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment

Source code in src/deeponto/align/oaei.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def get_ignored_class_index(onto: Ontology):
    """Get an index for filtering classes that are marked as not used in alignment.

    This is indicated by the special class annotation `use_in_alignment` with the following IRI:
        http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment
    """
    ignored_class_index = defaultdict(lambda: False)
    for class_iri, class_obj in onto.owl_classes.items():
        use_in_alignment = onto.get_annotations(
            class_obj, "http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment"
        )
        if use_in_alignment and str(use_in_alignment[0]).lower() == "false":
            ignored_class_index[class_iri] = True
    return ignored_class_index

remove_ignored_mappings(mappings, ignored_class_index)

Filter prediction mappings that involve classes to be ignored.

Source code in src/deeponto/align/oaei.py
46
47
48
49
50
51
52
53
def remove_ignored_mappings(mappings: List[EntityMapping], ignored_class_index: dict):
    """Filter prediction mappings that involve classes to be ignored."""
    results = []
    for m in mappings:
        if ignored_class_index[m.head] or ignored_class_index[m.tail]:
            continue
        results.append(m)
    return results

matching_eval(pred_maps_file, ref_maps_file, null_ref_maps_file=None, ignored_class_index=None, pred_maps_threshold=None)

Conduct global matching evaluation for the prediction mappings against the reference mappings.

The prediction mappings are formatted the same as full.tsv (the full reference mappings), with three columns: "SrcEntity", "TgtEntity", and "Score", indicating the source class IRI, the target class IRI, and the corresponding mapping score.

An ignored_class_index needs to be constructed for omitting prediction mappings that involve a class marked as not used in alignment.

Use the following code to obtain such index for both the source and target ontologies:

ignored_class_index = get_ignored_class_index(src_onto)
ignored_class_index.update(get_ignored_class_index(tgt_onto))
Source code in src/deeponto/align/oaei.py
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def matching_eval(
    pred_maps_file: str,
    ref_maps_file: str,
    null_ref_maps_file: Optional[str] = None,
    ignored_class_index: Optional[dict] = None,
    pred_maps_threshold: Optional[float] = None,
):
    r"""Conduct **global matching** evaluation for the prediction mappings against the
    reference mappings.

    The prediction mappings are formatted the same as `full.tsv` (the full reference mappings),
    with three columns: `"SrcEntity"`, `"TgtEntity"`, and `"Score"`, indicating the source
    class IRI, the target class IRI, and the corresponding mapping score.

    An `ignored_class_index` needs to be constructed for omitting prediction mappings
    that involve a class marked as **not used in alignment**.

    Use the following code to obtain such index for both the source and target ontologies:

    ```python
    ignored_class_index = get_ignored_class_index(src_onto)
    ignored_class_index.update(get_ignored_class_index(tgt_onto))
    ```
    """
    refs = ReferenceMapping.read_table_mappings(ref_maps_file, relation="=")
    preds = EntityMapping.read_table_mappings(pred_maps_file, relation="=", threshold=pred_maps_threshold)
    if ignored_class_index:
        preds = remove_ignored_mappings(preds, ignored_class_index)
    null_refs = ReferenceMapping.read_table_mappings(null_ref_maps_file, relation="=") if null_ref_maps_file else []
    results = AlignmentEvaluator.f1(preds, refs, null_reference_mappings=null_refs)
    return results

read_candidate_mappings(cand_maps_file, for_biollm=False, threshold=0.0)

Load scored or already ranked candidate mappings.

The predicted candidate mappings are formatted the same as test.cands.tsv, with three columns: "SrcEntity", "TgtEntity", and "TgtCandidates", indicating the source reference class IRI, the target reference class IRI, and a list of tuples in the form of (target_candidate_class_IRI, score) where score is optional if the candidate mappings have been ranked. For the Bio-LLM special sub-track, "TgtCandidates" refers to a list of triples in the form of (target_candidate_class_IRI, score, answer) where the answer is required for computing matching scores.

This method loads the candidate mappings in this format and parse them into the inputs of mean_reciprocal_rank and [hits_at_K][[mean_reciprocal_rank][deeponto.align.evaluation.AlignmentEvaluator.hits_at_K].

For Bio-LLM, the true prediction mappings and reference mappings will also be generated for the matching evaluation, i.e., the inputs of f1.

Source code in src/deeponto/align/oaei.py
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def read_candidate_mappings(cand_maps_file: str, for_biollm: bool = False, threshold: float = 0.0):
    r"""Load scored or already ranked candidate mappings.

    The predicted candidate mappings are formatted the same as `test.cands.tsv`, with three columns:
    `"SrcEntity"`, `"TgtEntity"`, and `"TgtCandidates"`, indicating the source reference class IRI, the
    target reference class IRI, and a list of **tuples** in the form of `(target_candidate_class_IRI, score)` where
    `score` is optional if the candidate mappings have been ranked. For the Bio-LLM special sub-track, `"TgtCandidates"`
    refers to a list of **triples** in the form of `(target_candidate_class_IRI, score, answer)` where the `answer` is
    required for computing matching scores.

    This method loads the candidate mappings in this format and parse them into the inputs of [`mean_reciprocal_rank`][deeponto.align.evaluation.AlignmentEvaluator.mean_reciprocal_rank]
    and [`hits_at_K`][[`mean_reciprocal_rank`][deeponto.align.evaluation.AlignmentEvaluator.hits_at_K].

    For Bio-LLM, the true prediction mappings and reference mappings will also be generated for the matching evaluation, i.e., the inputs of [`f1`][deeponto.align.evaluation.AlignmentEvaluator.f1].
    """

    all_cand_maps = read_table(cand_maps_file).values.tolist()
    cands = []
    unmatched_cands = []
    preds = []  # only used for bio-llm
    refs = []  # only used for bio-llm

    for src_ref_class, tgt_ref_class, tgt_cands in all_cand_maps:
        ref_map = ReferenceMapping(src_ref_class, tgt_ref_class, "=")
        tgt_cands = eval(tgt_cands)
        has_score = True if all([not isinstance(x, str) for x in tgt_cands]) else False
        cand_maps = []
        refs.append(ref_map) if tgt_ref_class != "UnMatched" else None
        if for_biollm:
            for t, s, a in tgt_cands:
                m = EntityMapping(src_ref_class, t, "=", s)
                cand_maps.append(m)
                if a is True and s >= threshold:  # only keep first one
                    preds.append(m)
        elif has_score:
            cand_maps = [EntityMapping(src_ref_class, t, "=", s) for t, s in tgt_cands]
        else:
            warnings.warn("Input candidate mappings do not have a score, assume default rank in descending order.")
            cand_maps = [
                EntityMapping(src_ref_class, t, "=", (len(tgt_cands) - i) / len(tgt_cands))
                for i, t in enumerate(tgt_cands)
            ]
        cand_maps = EntityMapping.sort_entity_mappings_by_score(cand_maps)
        if for_biollm and tgt_ref_class == "UnMatched":
            unmatched_cands.append((ref_map, cand_maps))
        else:
            cands.append((ref_map, cand_maps))

    if for_biollm:
        return cands, unmatched_cands, preds, refs
    else:
        return cands

ranking_result_file_check(cand_maps_file, ref_cand_maps_file)

Check if the ranking result file is formatted correctly as the original test.cands.tsv file provided in the dataset.

Source code in src/deeponto/align/oaei.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
def ranking_result_file_check(cand_maps_file: str, ref_cand_maps_file: str):
    r"""Check if the ranking result file is formatted correctly as the original
    `test.cands.tsv` file provided in the dataset.
    """
    formatted_cand_maps = read_candidate_mappings(cand_maps_file)
    formatted_ref_cand_maps = read_candidate_mappings(ref_cand_maps_file)
    assert len(formatted_cand_maps) == len(
        formatted_ref_cand_maps
    ), f"Mismatched number of reference mappings: {len(formatted_cand_maps)}; should be {len(formatted_ref_cand_maps)}."
    for i in range(len(formatted_cand_maps)):
        anchor, cands = formatted_cand_maps[i]
        ref_anchor, ref_cands = formatted_ref_cand_maps[i]
        assert (
            anchor.to_tuple() == ref_anchor.to_tuple()
        ), f"Mismatched reference mapping: {anchor}; should be {ref_anchor}."
        cands = [c.to_tuple() for c in cands]
        ref_cands = [rc.to_tuple() for rc in ref_cands]
        assert not (
            set(cands) - set(ref_cands)
        ), f"Mismatch set of candidate mappings for the reference mapping: {anchor}."

ranking_eval(cand_maps_file, Ks=[1, 5, 10])

Conduct local ranking evaluation for the scored or ranked candidate mappings.

See read_candidate_mappings for the file format and loading.

Source code in src/deeponto/align/oaei.py
175
176
177
178
179
180
181
182
183
184
def ranking_eval(cand_maps_file: str, Ks=[1, 5, 10]):
    r"""Conduct **local ranking** evaluation for the scored or ranked candidate mappings.

    See [`read_candidate_mappings`][deeponto.align.oaei.read_candidate_mappings] for the file format and loading.
    """
    formatted_cand_maps = read_candidate_mappings(cand_maps_file)
    results = {"MRR": AlignmentEvaluator.mean_reciprocal_rank(formatted_cand_maps)}
    for K in Ks:
        results[f"Hits@{K}"] = AlignmentEvaluator.hits_at_K(formatted_cand_maps, K=K)
    return results

is_rejection(preds, cands)

A successful rejection means none of the candidate mappings are predicted as true mappings.

Source code in src/deeponto/align/oaei.py
192
193
194
def is_rejection(preds: List[EntityMapping], cands: List[EntityMapping]):
    """A successful rejection means none of the candidate mappings are predicted as true mappings."""
    return set([p.to_tuple() for p in preds]).intersection(set([c.to_tuple() for c in cands])) == set()

biollm_eval(cand_maps_file, Ks=[1], threshold=0.0)

Conduct Bio-LLM evaluation for the Bio-LLM formatted candidate mappings.

See read_candidate_mappings for the file format and loading.

Source code in src/deeponto/align/oaei.py
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
def biollm_eval(cand_maps_file, Ks=[1], threshold: float = 0.0):
    r"""Conduct Bio-LLM evaluation for the Bio-LLM formatted candidate mappings.

    See [`read_candidate_mappings`][deeponto.align.oaei.read_candidate_mappings] for the file format and loading.
    """
    matched_cand_maps, unmatched_cand_maps, preds, refs = read_candidate_mappings(
        cand_maps_file, for_biollm=True, threshold=threshold
    )

    results = AlignmentEvaluator.f1(preds, refs)
    for K in Ks:
        results[f"Hits@{K}"] = AlignmentEvaluator.hits_at_K(matched_cand_maps, K=K)
    results["MRR"] = AlignmentEvaluator.mean_reciprocal_rank(matched_cand_maps)
    rej = 0
    for _, cs in unmatched_cand_maps:
        rej += int(is_rejection(preds, cs))
    results["RR"] = rej / len(unmatched_cand_maps)
    return results

Last update: July 23, 2023
Created: July 23, 2023
GitHub: @Lawhy   Personal Page: yuanhe.wiki