BERTMap
Paper
\(\textsf{BERTMap}\) is proposed in the paper: BERTMap: A BERT-based Ontology Alignment System (AAAI-2022).
@inproceedings{he2022bertmap,
title={BERTMap: a BERT-based ontology alignment system},
author={He, Yuan and Chen, Jiaoyan and Antonyrajah, Denvar and Horrocks, Ian},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={36},
number={5},
pages={5684--5691},
year={2022}
}
\(\textsf{BERTMap}\) is a BERT-based ontology matching (OM) system consisting of following components:
- Text semantics corpora construction from input ontologies, and optionally from input mappings and other auxiliary ontologies.
- BERT synonym classifier training on synonym and non-synonym samples in text semantics corpora.
- Sub-word Inverted Index construction from the tokenised class annotations for candidate selection in mapping prediction.
- Mapping Predictor which integrates a simple edit distance-based string matching module and the fine-tuned BERT synonym classifier for mapping scoring. For each source ontology class, narrow down target class candidates using the sub-word inverted index, apply string matching for "easy" mappings and then apply BERT matching.
- Mapping Refiner which consists of the mapping extension and mapping repair modules. Mapping extension is an iterative process based on the locality principle. Mapping repair utilises the LogMap's debugger.
\(\textsf{BERTMapLt}\) is a light-weight version of \(\textsf{BERTMap}\) without the BERT module and mapping refiner.
See the tutorial for \(\textsf{BERTMap}\) here.
BERTMapPipeline(src_onto, tgt_onto, config)
Class for the whole ontology alignment pipeline of \(\textsf{BERTMap}\) and \(\textsf{BERTMapLt}\) models.
Note
Parameters related to BERT training are None
by default. They will be constructed for
\(\textsf{BERTMap}\) and stay as None
for \(\textsf{BERTMapLt}\).
Attributes:
Name | Type | Description |
---|---|---|
config |
CfgNode
|
The configuration for BERTMap or BERTMapLt. |
name |
str
|
The name of the model, either |
output_path |
str
|
The path to the output directory. |
src_onto |
Ontology
|
The source ontology to be matched. |
tgt_onto |
Ontology
|
The target ontology to be matched. |
annotation_property_iris |
List[str]
|
The annotation property IRIs used for extracting synonyms and nonsynonyms. |
src_annotation_index |
dict
|
A dictionary that stores the |
tgt_annotation_index |
dict
|
A dictionary that stores the |
known_mappings |
List[ReferenceMapping]
|
List of known mappings for constructing the cross-ontology corpus. |
auxliary_ontos |
List[Ontology]
|
List of auxiliary ontolgoies for constructing any auxiliary corpus. |
corpora |
dict
|
A dictionary that stores the |
finetune_data |
dict
|
A dictionary that stores the |
bert |
BERTSynonymClassifier
|
A BERT model for synonym classification and mapping prediction. |
best_checkpoint |
str
|
The path to the best BERT checkpoint which will be loaded after training. |
mapping_predictor |
MappingPredictor
|
The predictor function based on class annotations, used for global matching or mapping scoring. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src_onto |
Ontology
|
The source ontology for alignment. |
required |
tgt_onto |
Ontology
|
The target ontology for alignment. |
required |
config |
CfgNode
|
The configuration for BERTMap or BERTMapLt. |
required |
Source code in src/deeponto/align/bertmap/pipeline.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
|
load_or_construct(data_file, data_name, construct_func, *args, **kwargs)
Load existing data or construct a new one.
An auxlirary function that checks the existence of a data file and loads it if it exists.
Otherwise, construct new data with the input construct_func
which is supported generate
a local data file.
Source code in src/deeponto/align/bertmap/pipeline.py
220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
|
load_text_semantics_corpora()
Load or construct text semantics corpora.
See TextSemanticsCorpora
.
Source code in src/deeponto/align/bertmap/pipeline.py
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 |
|
load_finetune_data()
Load or construct fine-tuning data from text semantics corpora.
Steps of constructing fine-tuning data from text semantics:
- Mix synonym and nonsynonym data.
- Randomly sample 90% as training samples and 10% as validation.
Source code in src/deeponto/align/bertmap/pipeline.py
260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
|
load_bert_synonym_classifier()
Load the BERT model from a pre-trained or a local checkpoint.
- If loaded from pre-trained, it means to start training from a pre-trained model such as
bert-uncased
. - If loaded from local, turn on the
eval
mode for mapping predictions. - If
self.bert_resume_training
isTrue
, it will be loaded from the latest saved checkpoint.
Source code in src/deeponto/align/bertmap/pipeline.py
286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 |
|
load_best_checkpoint()
Find the best checkpoint by searching for trainer states in each checkpoint file.
Source code in src/deeponto/align/bertmap/pipeline.py
313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 |
|
load_bertmap_config(config_file=None)
staticmethod
Load the BERTMap configuration in .yaml
. If the file
is not provided, use the default configuration.
Source code in src/deeponto/align/bertmap/pipeline.py
336 337 338 339 340 341 342 343 344 345 346 |
|
save_bertmap_config(config, config_file)
staticmethod
Save the BERTMap configuration in .yaml
.
Source code in src/deeponto/align/bertmap/pipeline.py
348 349 350 351 352 |
|
AnnotationThesaurus(onto, annotation_property_iris, apply_transitivity=False)
A thesaurus class for synonyms and non-synonyms extracted from an ontology.
Some related definitions of arguments here:
- A
synonym_group
is a set of annotation phrases that are synonymous to each other; - The
transitivity
of synonyms means if A and B are synonymous and B and C are synonymous, then A and C are synonymous. This is achieved by a connected graph-based algorithm. - A
synonym_pair
is a pair synonymous annotation phrase which can be extracted from the cartesian product of asynonym_group
and itself. NOTE that reflexivity and symmetry are preserved meaning that (i) every phrase A is a synonym of itself and (ii) if (A, B) is a synonym pair then (B, A) is a synonym pair, too.
Attributes:
Name | Type | Description |
---|---|---|
onto |
Ontology
|
An ontology to construct the annotation thesaurus from. |
annotation_index |
Dict[str, Set[str]]
|
An index of the class annotations with |
annotation_property_iris |
List[str]
|
A list of annotation property IRIs used to extract the annotations. |
average_number_of_annotations_per_class |
int
|
The average number of (extracted) annotations per ontology class. |
apply_transitivity |
bool
|
Apply synonym transitivity to merge synonym groups or not. |
synonym_groups |
List[Set[str]]
|
The list of synonym groups extracted from the ontology according to specified annotation properties. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
onto |
Ontology
|
The input ontology to extract annotations from. |
required |
annotation_property_iris |
List[str]
|
Specify which annotation properties to be used. |
required |
apply_transitivity |
bool
|
Apply synonym transitivity to merge synonym groups or not. Defaults to |
False
|
Source code in src/deeponto/align/bertmap/text_semantics.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
get_synonym_pairs(synonym_group, remove_duplicates=True)
staticmethod
Get synonym pairs from a synonym group through a cartesian product.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synonym_group |
Set[str]
|
A set of annotation phrases that are synonymous to each other. |
required |
Returns:
Type | Description |
---|---|
List[Tuple[str, str]]
|
A list of synonym pairs. |
Source code in src/deeponto/align/bertmap/text_semantics.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
merge_synonym_groups_by_transitivity(synonym_groups)
staticmethod
Merge synonym groups by transitivity.
Synonym groups that share a common annotation phrase will be merged. NOTE that for multiple ontologies, we can merge their synonym groups by first concatenating them then use this function.
Note
In \(\textsf{BERTMap}\) experiments we have considered this as a data augmentation approach but it does not bring a significant performance improvement. However, if the overall number of annotations is not large enough then this could be a good option.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synonym_groups |
List[Set[str]]
|
A sequence of synonym groups to be merged. |
required |
Returns:
Type | Description |
---|---|
List[Set[str]]
|
A list of merged synonym groups. |
Source code in src/deeponto/align/bertmap/text_semantics.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
connected_annotations(synonym_pairs)
staticmethod
Build a graph for adjacency among the class annotations (labels) such that the transitivity of synonyms is ensured.
Auxiliary function for merge_synonym_groups_by_transitivity
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synonym_pairs |
List[Tuple[str, str]]
|
List of pairs of phrases that are synonymous. |
required |
Returns:
Type | Description |
---|---|
List[Set[str]]
|
A list of synonym groups. |
Source code in src/deeponto/align/bertmap/text_semantics.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
synonym_sampling(num_samples=None)
Sample synonym pairs from a list of synonym groups extracted from the input ontology.
According to the \(\textsf{BERTMap}\) paper, synonyms are defined as label pairs that belong to the same ontology class.
NOTE this has been validated for getting the same results as in the original \(\textsf{BERTMap}\) repository.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_samples |
int
|
The (maximum) number of unique samples extracted. Defaults to |
None
|
Returns:
Type | Description |
---|---|
List[Tuple[str, str]]
|
A list of unique synonym pair samples. |
Source code in src/deeponto/align/bertmap/text_semantics.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
soft_nonsynonym_sampling(num_samples, max_iter=5)
Sample soft non-synonyms from a list of synonym groups extracted from the input ontology.
According to the \(\textsf{BERTMap}\) paper, soft non-synonyms are defined as label pairs from two different synonym groups that are randomly selected.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_samples |
int
|
The (maximum) number of unique samples extracted; this is required unlike for synonym sampling because the non-synonym pool is significantly larger (considering random combinations of different synonym groups). |
required |
max_iter |
int
|
The maximum number of iterations for conducting sampling. Defaults to |
5
|
Returns:
Type | Description |
---|---|
List[Tuple[str, str]]
|
A list of unique (soft) non-synonym pair samples. |
Source code in src/deeponto/align/bertmap/text_semantics.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
|
weighted_random_choices_of_sibling_groups(k=1)
Randomly (weighted) select a number of sibling class groups.
The weights are computed according to the sizes of the sibling class groups.
Source code in src/deeponto/align/bertmap/text_semantics.py
226 227 228 229 230 231 232 233 |
|
hard_nonsynonym_sampling(num_samples, max_iter=5)
Sample hard non-synonyms from sibling classes of the input ontology.
According to the \(\textsf{BERTMap}\) paper, hard non-synonyms are defined as label pairs that belong to two disjoint ontology classes. For practical reason, the condition is eased to two sibling ontology classes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_samples |
int
|
The (maximum) number of unique samples extracted; this is required unlike for synonym sampling because the non-synonym pool is significantly larger (considering random combinations of different synonym groups). |
required |
max_iter |
int
|
The maximum number of iterations for conducting sampling. Defaults to |
5
|
Returns:
Type | Description |
---|---|
List[Tuple[str, str]]
|
A list of unique (hard) non-synonym pair samples. |
Source code in src/deeponto/align/bertmap/text_semantics.py
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
|
IntraOntologyTextSemanticsCorpus(onto, annotation_property_iris, soft_negative_ratio=2, hard_negative_ratio=2)
Class for creating the intra-ontology text semantics corpus from an ontology.
As defined in the \(\textsf{BERTMap}\) paper, the intra-ontology text semantics corpus consists of synonym and non-synonym pairs extracted from the ontology class annotations.
Attributes:
Name | Type | Description |
---|---|---|
onto |
Ontology
|
An ontology to construct the intra-ontology text semantics corpus from. |
annotation_property_iris |
List[str]
|
Specify which annotation properties to be used. |
soft_negative_ratio |
int
|
The expected negative sample ratio of the soft non-synonyms to the extracted synonyms. Defaults to |
hard_negative_ratio |
int
|
The expected negative sample ratio of the hard non-synonyms to the extracted synonyms. Defaults to |
Source code in src/deeponto/align/bertmap/text_semantics.py
302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 |
|
save(save_path)
Save the intra-ontology corpus (a .json
file for label pairs
and its summary) in the specified directory.
Source code in src/deeponto/align/bertmap/text_semantics.py
334 335 336 337 338 339 340 341 342 343 344 |
|
CrossOntologyTextSemanticsCorpus(class_mappings, src_onto, tgt_onto, annotation_property_iris, negative_ratio=4)
Class for creating the cross-ontology text semantics corpus from two ontologies and provided mappings between them.
As defined in the \(\textsf{BERTMap}\) paper, the cross-ontology text semantics corpus consists of synonym and non-synonym pairs extracted from the annotations/labels of class pairs involved in the provided cross-ontology mappigns.
Attributes:
Name | Type | Description |
---|---|---|
class_mappings |
List[ReferenceMapping]
|
A list of cross-ontology class mappings. |
src_onto |
Ontology
|
The source ontology whose class IRIs are heads of the |
tgt_onto |
Ontology
|
The target ontology whose class IRIs are tails of the |
annotation_property_iris |
List[str]
|
A list of annotation property IRIs used to extract the annotations. |
negative_ratio |
int
|
The expected negative sample ratio of the non-synonyms to the extracted synonyms. Defaults to |
Source code in src/deeponto/align/bertmap/text_semantics.py
363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 |
|
save(save_path)
Save the cross-ontology corpus (a .json
file for label pairs
and its summary) in the specified directory.
Source code in src/deeponto/align/bertmap/text_semantics.py
396 397 398 399 400 401 402 403 404 405 406 |
|
synonym_sampling_from_mappings()
Sample synonyms from cross-ontology class mappings.
Arguments of this method are all class attributes.
See CrossOntologyTextSemanticsCorpus
.
According to the \(\textsf{BERTMap}\) paper, cross-ontology synonyms are defined as label pairs
that belong to two matched classes. Suppose the class \(C\) from the source ontology
and the class \(D\) from the target ontology are matched according to one of the class_mappings
,
then the cartesian product of labels of \(C\) and labels of \(D\) form cross-ontology synonyms.
Note that identity synonyms in the form of \((a, a)\) are removed because they have been covered
in the intra-ontology case.
Returns:
Type | Description |
---|---|
List[Tuple[str, str]]
|
A list of unique synonym pair samples from ontology class mappings. |
Source code in src/deeponto/align/bertmap/text_semantics.py
408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 |
|
nonsynonym_sampling_from_mappings(num_samples, max_iter=5)
Sample non-synonyms from cross-ontology class mappings.
Arguments of this method are all class attributes.
See CrossOntologyTextSemanticsCorpus
.
According to the \(\textsf{BERTMap}\) paper, cross-ontology non-synonyms are defined as label pairs that belong to two unmatched classes. Assume that the provided class mappings are self-contained in the sense that they are complete for the classes involved in them, then we can randomly sample two cross-ontology classes that are not matched according to the mappings and take their labels as nonsynonyms. In practice, it is quite unlikely to obtain false negatives since the number of incorrect mappings is much larger than the number of correct ones.
Returns:
Type | Description |
---|---|
List[Tuple[str, str]]
|
A list of unique nonsynonym pair samples from ontology class mappings. |
Source code in src/deeponto/align/bertmap/text_semantics.py
439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 |
|
TextSemanticsCorpora(src_onto, tgt_onto, annotation_property_iris, class_mappings=None, auxiliary_ontos=None)
Class for creating the collection text semantics corpora.
As defined in the \(\textsf{BERTMap}\) paper, the collection of text semantics corpora contains at least two intra-ontology sub-corpora from the source and target ontologies, respectively. If some class mappings are provided, then a cross-ontology sub-corpus will be created. If some additional auxiliary ontologies are provided, the intra-ontology corpora created from them will serve as the auxiliary sub-corpora.
Attributes:
Name | Type | Description |
---|---|---|
src_onto |
Ontology
|
The source ontology to be matched or aligned. |
tgt_onto |
Ontology
|
The target ontology to be matched or aligned. |
annotation_property_iris |
List[str]
|
A list of annotation property IRIs used to extract the annotations. |
class_mappings |
List[ReferenceMapping]
|
A list of cross-ontology class mappings between the
source and the target ontologies. Defaults to |
auxiliary_ontos |
List[Ontology]
|
A list of auxiliary ontologies for augmenting more synonym/non-synonym samples. Defaults to |
Source code in src/deeponto/align/bertmap/text_semantics.py
505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 |
|
save(save_path)
Save the overall text semantics corpora (a .json
file for label pairs
and its summary) in the specified directory.
Source code in src/deeponto/align/bertmap/text_semantics.py
568 569 570 571 572 573 574 575 576 577 578 |
|
add_samples_from_sub_corpus(sub_corpus)
Add synonyms and non-synonyms from each sub-corpus to the overall collection.
Source code in src/deeponto/align/bertmap/text_semantics.py
580 581 582 583 584 585 586 587 588 |
|
BERTSynonymClassifier(loaded_path, output_path, eval_mode, max_length_for_input, num_epochs_for_training=None, batch_size_for_training=None, batch_size_for_prediction=None, training_data=None, validation_data=None)
Class for BERT synonym classifier.
The main scoring module of \(\textsf{BERTMap}\) consisting of a BERT model and a binary synonym classifier.
Attributes:
Name | Type | Description |
---|---|---|
loaded_path |
str
|
The path to the checkpoint of a pre-trained BERT model. |
output_path |
str
|
The path to the output BERT model (usually fine-tuned). |
eval_mode |
bool
|
Set to |
max_length_for_input |
int
|
The maximum length of an input sequence. |
num_epochs_for_training |
int
|
The number of epochs for training a BERT model. |
batch_size_for_training |
int
|
The batch size for training a BERT model. |
batch_size_for_prediction |
int
|
The batch size for making predictions. |
training_data |
Dataset
|
Data for training the model if |
validation_data |
Dataset
|
Data for validating the model if |
training_args |
TrainingArguments
|
Training arguments for training the model if |
trainer |
Trainer
|
The model trainer fed with |
softmax |
torch.nn.SoftMax
|
The softmax layer used for normalising synonym scores. Defaults to |
Source code in src/deeponto/align/bertmap/bert_classifier.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
train(resume_from_checkpoint=None)
Start training the BERT model.
Source code in src/deeponto/align/bertmap/bert_classifier.py
136 137 138 139 140 |
|
eval()
To eval mode.
Source code in src/deeponto/align/bertmap/bert_classifier.py
142 143 144 145 146 147 148 149 |
|
predict(sent_pairs)
Run prediction pipeline for synonym classification.
Return the softmax
probailities of predicting pairs as synonyms (index=1
).
Source code in src/deeponto/align/bertmap/bert_classifier.py
151 152 153 154 155 156 157 158 |
|
load_dataset(data, split)
Load the list of (annotation1, annotation2, label)
samples into a datasets.Dataset
.
Source code in src/deeponto/align/bertmap/bert_classifier.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
process_inputs(sent_pairs)
Process input sentence pairs for the BERT model.
Transform the sentences into BERT input embeddings and load them into the device.
This function is called only when the BERT model is about to make predictions (eval
mode).
Source code in src/deeponto/align/bertmap/bert_classifier.py
178 179 180 181 182 183 184 185 186 187 188 189 190 |
|
compute_metrics(pred)
staticmethod
Add more evaluation metrics into the training log.
Source code in src/deeponto/align/bertmap/bert_classifier.py
192 193 194 195 196 197 198 199 |
|
get_device(device_num=0)
staticmethod
Get a device (GPU or CPU) for the torch model
Source code in src/deeponto/align/bertmap/bert_classifier.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
|
set_seed(seed_val=888)
staticmethod
Set random seed for reproducible results.
Source code in src/deeponto/align/bertmap/bert_classifier.py
216 217 218 219 220 221 222 |
|
MappingPredictor(output_path, tokenizer_path, src_annotation_index, tgt_annotation_index, bert_synonym_classifier, num_raw_candidates, num_best_predictions, batch_size_for_prediction, logger, enlighten_manager, enlighten_status, ignored_class_index=None)
Class for the mapping prediction module of \(\textsf{BERTMap}\) and \(\textsf{BERTMapLt}\) models.
Attributes:
Name | Type | Description |
---|---|---|
tokenizer |
Tokenizer
|
The tokenizer used for constructing the inverted annotation index and candidate selection. |
src_annotation_index |
dict
|
A dictionary that stores the |
tgt_annotation_index |
dict
|
A dictionary that stores the |
tgt_inverted_annotation_index |
InvertedIndex
|
The inverted index built from |
bert_synonym_classifier |
BERTSynonymClassifier
|
The BERT synonym classifier fine-tuned on text semantics corpora. |
num_raw_candidates |
int
|
The maximum number of selected target class candidates for a source class. |
num_best_predictions |
int
|
The maximum number of best scored mappings presevred for a source class. |
batch_size_for_prediction |
int
|
The batch size of class annotation pairs for computing synonym scores. |
ignored_class_index |
dict
|
OAEI arguemnt, a dictionary that stores the |
Source code in src/deeponto/align/bertmap/mapping_prediction.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
bert_mapping_score(src_class_annotations, tgt_class_annotations)
\(\textsf{BERTMap}\)'s main mapping score module which utilises the fine-tuned BERT synonym classifier.
Compute the synonym score for each pair of src-tgt class annotations, and return the average score as the mapping score. Apply string matching before applying the BERT module to filter easy mappings (with scores \(1.0\)).
Source code in src/deeponto/align/bertmap/mapping_prediction.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
edit_similarity_mapping_score(src_class_annotations, tgt_class_annotations, string_match_only=False)
staticmethod
\(\textsf{BERTMap}\)'s string match module and \(\textsf{BERTMapLt}\)'s mapping prediction function.
Compute the normalised edit similarity (1 - normalised edit distance)
for each pair
of src-tgt class annotations, and return the maximum score as the mapping score.
Source code in src/deeponto/align/bertmap/mapping_prediction.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
|
mapping_prediction_for_src_class(src_class_iri)
Predict \(N\) best scored mappings for a source ontology class, where
\(N\) is specified in self.num_best_predictions
.
- Apply the string matching module to compute "easy" mappings.
- Return the mappings if found any, or if there is no BERT synonym classifier as in \(\textsf{BERTMapLt}\).
-
If using the BERT synonym classifier module:
- Generate batches for class annotation pairs. Each batch contains the combinations of the
source class annotations and \(M\) target candidate classes' annotations. \(M\) is determined
by
batch_size_for_prediction
, i.e., stop adding annotations of a target class candidate into the current batch if this operation will cause the size of current batch to exceed the limit. - Compute the synonym scores for each batch and aggregate them into mapping scores; preserve \(N\) best scored candidates and update them in the next batch. By this dynamic process, we eventually get \(N\) best scored mappings for a source ontology class.
- Generate batches for class annotation pairs. Each batch contains the combinations of the
source class annotations and \(M\) target candidate classes' annotations. \(M\) is determined
by
Source code in src/deeponto/align/bertmap/mapping_prediction.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 |
|
mapping_prediction()
Apply global matching for each class in the source ontology.
See mapping_prediction_for_src_class
.
If this process is accidentally stopped, it can be resumed from already saved predictions. The progress bar keeps track of the number of source ontology classes that have been matched.
Source code in src/deeponto/align/bertmap/mapping_prediction.py
283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 |
|
MappingRefiner(output_path, src_onto, tgt_onto, mapping_predictor, mapping_extension_threshold, mapping_filtered_threshold, logger, enlighten_manager, enlighten_status)
Class for the mapping refinement module of \(\textsf{BERTMap}\).
\(\textsf{BERTMapLt}\) does not go through mapping refinement for its being "light".
All the attributes of this class are supposed to be passed from BERTMapPipeline
.
Attributes:
Name | Type | Description |
---|---|---|
src_onto |
Ontology
|
The source ontology to be matched. |
tgt_onto |
Ontology
|
The target ontology to be matched. |
mapping_predictor |
MappingPredictor
|
The mapping prediction module of BERTMap. |
mapping_extension_threshold |
float
|
Mappings with scores \(\geq\) this value will be considered in the iterative mapping extension process. |
raw_mappings |
List[EntityMapping]
|
List of raw class mappings predicted in the global matching phase. |
mapping_score_dict |
dict
|
A dynamic dictionary that keeps track of mappings (with scores) that have already been computed. |
mapping_filter_threshold |
float
|
Mappings with scores \(\geq\) this value will be preserved for the final mapping repairing. |
Source code in src/deeponto/align/bertmap/mapping_refinement.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
mapping_extension(max_iter=10)
Iterative mapping extension based on the locality principle.
For each class pair \((c, c')\) (scored in the global matching phase) with score
\(\geq \kappa\), search for plausible mappings between the parents of \(c\) and \(c'\),
and between the children of \(c\) and \(c'\). This is an iterative process as the set
newly discovered mappings can act renew the frontier for searching. Terminate if
no new mappings with score \(\geq \kappa\) can be found or the limit max_iter
has
been reached. Note that \(\kappa\) is set to \(0.9\) by default (can be altered
in the configuration file). The mapping extension progress bar keeps track of the
total number of extended mappings (including the previously predicted ones).
A further filtering will be performed by only preserving mappings with score \(\geq \lambda\), in the original BERTMap paper, \(\lambda\) is determined by the validation mappings, but in practice \(\lambda\) is not a sensitive hyperparameter and validation mappings are often not available. Therefore, we manually set \(\lambda\) to \(0.9995\) by default (can be altered in the configuration file). The mapping filtering progress bar keeps track of the total number of filtered mappings (this bar is purely for logging purpose).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_iter |
int
|
The maximum number of mapping extension iterations. Defaults to |
10
|
Source code in src/deeponto/align/bertmap/mapping_refinement.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
|
one_hop_extend(src_class_iri, tgt_class_iri, pool_size=200)
Extend mappings from a scored class pair \((c, c')\) by searching from one-hop neighbors.
Search for plausible mappings between the parents of \(c\) and \(c'\),
and between the children of \(c\) and \(c'\). Mappings that are not
already computed (recorded in self.mapping_score_dict
) and have
a score \(\geq\) self.mapping_extension_threshold
will be returned as
new mappings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src_class_iri |
str
|
The IRI of the source ontology class \(c\). |
required |
tgt_class_iri |
str
|
The IRI of the target ontology class \(c'\). |
required |
pool_size |
int
|
The maximum number of plausible mappings to be extended. Defaults to 200. |
200
|
Returns:
Type | Description |
---|---|
List[EntityMapping]
|
A list of one-hop extended mappings. |
Source code in src/deeponto/align/bertmap/mapping_refinement.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 |
|
mapping_repair()
Repair the filtered mappings with LogMap's debugger.
Note
A sub-folder under match
named logmap-repair
contains LogMap-related intermediate files.
Source code in src/deeponto/align/bertmap/mapping_refinement.py
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 |
|
logmap_repair_formatting()
Transform the filtered mapping file into the LogMap format.
An auxiliary function of the mapping repair module which requires mappings to be formatted as LogMap's input format.
Source code in src/deeponto/align/bertmap/mapping_refinement.py
322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 |
|
Created: January 13, 2023