Ontology Verbalisation
Verbalising an ontology into natural language texts is a challenging task. \(\textsf{DeepOnto}\) provides some basic building blocks for achieving this goal. The implemented OntologyVerbaliser
is essentially a recursive concept verbaliser that first splits a complex concept \(C\) into a sub-formula tree, verbalising the leaf nodes (atomic concepts or object properties) by their names, then merging the verbalised child nodes according to the logical pattern at their parent node.
Please cite the following paper if you consider using our verbaliser.
Paper
The recursive concept verbaliser is proposed in the paper: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).
@inproceedings{he-etal-2023-language,
title = "Language Model Analysis for Ontology Subsumption Inference",
author = "He, Yuan and
Chen, Jiaoyan and
Jimenez-Ruiz, Ernesto and
Dong, Hang and
Horrocks, Ian",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-acl.213",
doi = "10.18653/v1/2023.findings-acl.213",
pages = "3439--3453"
}
OntologyVerbaliser(onto, apply_lowercasing=False, keep_iri=False, apply_auto_correction=False, add_quantifier_word=False)
A recursive natural language verbaliser for the OWL logical expressions, e.g., OWLAxiom
and OWLClassExpression
.
The concept patterns supported by this verbaliser are shown below:
Pattern | Verbalisation (\(\mathcal{V}\)) |
---|---|
\(A\) (atomic) | the name (\(\texttt{rdfs:label}\)) of \(A\) (auto-correction is optional) |
\(r\) (property) | the name (\(\texttt{rdfs:label}\)) of \(r\) (auto-correction is optional) |
\(\neg C\) | "not \(\mathcal{V}(C)\)" |
\(\exists r.C\) | "something that \(\mathcal{V}(r)\) some \(\mathcal{V}(C)\)" (the quantifier word "some" is optional) |
\(\forall r.C\) | "something that \(\mathcal{V}(r)\) only \(\mathcal{V}(C)\)" (the quantifier word "only" is optional) |
\(C_1 \sqcap ... \sqcap C_n\) | if \(C_i = \exists/\forall r.D_i\) and \(C_j = \exists/\forall r.D_j\), they will be re-written into \(\exists/\forall r.(D_i \sqcap D_j)\) before verbalisation; suppose after re-writing the new expression is \(C_1 \sqcap ... \sqcap C_{n'}\) (a) if all \(C_i\)s (for \(i = 1, ..., n'\)) are restrictions, in the form of \(\exists/\forall r_i.D_i\): |
\(C_1 \sqcup ... \sqcup C_n\) | similar to verbalising \(C_1 \sqcap ... \sqcap C_n\) except that "and" is replaced by "or" and case (b) uses the same verbalisation as case (c) |
\(r_1 \cdot r_2\) (property chain) | \(\mathcal{V}(r_1)\) something that \(\mathcal{V}(r_2)\) |
With this concept verbaliser, a range of OWL axioms are supported:
- Class axioms for subsumption, equivalence, assertion.
- Object property axioms for subsumption, assertion.
The verbaliser operates at the concept level, and an additional template is needed to integrate the verbalised components of an axiom.
Warning
This verbaliser utilises spacy for POS tagging used in the auto-correction of property names.
Automatic download of the rule-based library en_core_web_sm
is available at the init function. However, if you
somehow cannot find it, please manually download it using python -m spacy download en_core_web_sm
.
Attributes:
Name | Type | Description |
---|---|---|
onto |
Ontology
|
An ontology whose entities and axioms are to be verbalised. |
parser |
OntologySyntaxParser
|
A syntax parser for the string representation of an |
vocab |
dict[str, list[str]]
|
A dictionary with |
apply_lowercasing |
bool
|
Whether to apply lowercasing to the entity names. Defaults to |
keep_iri |
bool
|
Whether to keep the IRIs of entities without verbalising them using |
apply_auto_correction |
bool
|
Whether to automatically apply rule-based auto-correction to entity names. Defaults to |
add_quantifier_word |
bool
|
Whether to add quantifier words ("some"/"only") as in the Manchester syntax. Defaults to |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
onto |
Ontology
|
An ontology whose entities and axioms are to be verbalised. |
required |
apply_lowercasing |
bool
|
Whether to apply lowercasing to the entity names. Defaults to |
False
|
keep_iri |
bool
|
Whether to keep the IRIs of entities without verbalising them using |
False
|
apply_auto_correction |
bool
|
Whether to automatically apply rule-based auto-correction to entity names. Defaults to |
False
|
add_quantifier_word |
bool
|
Whether to add quantifier words ("some"/"only") as in the Manchester syntax. Defaults to |
False
|
Source code in src/deeponto/onto/verbalisation.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
|
update_entity_name(entity_iri, entity_name)
Update the name of an entity in self.vocab
.
If you want to change the name of a specific entity, you should call this function before applying verbalisation.
Source code in src/deeponto/onto/verbalisation.py
184 185 186 187 188 189 190 |
|
verbalise_class_expression(class_expression)
Verbalise a class expression (OWLClassExpression
) or its parsed form (in RangeNode
).
See currently supported types of class (or concept) expressions here.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_expression |
Union[OWLClassExpression, str, RangeNode]
|
A class expression to be verbalised. |
required |
Raises:
Type | Description |
---|---|
RuntimeError
|
Occurs when the class expression is not in one of the supported types. |
Returns:
Type | Description |
---|---|
CfgNode
|
A nested dictionary that presents the recursive results of verbalisation. The verbalised string
can be accessed with the key |
Source code in src/deeponto/onto/verbalisation.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
|
verbalise_class_subsumption_axiom(class_subsumption_axiom)
Verbalise a class subsumption axiom.
The subsumption axiom can have two forms:
- \(C_{sub} \sqsubseteq C_{super}\), the
SubClassOf
axiom; - \(C_{super} \sqsupseteq C_{sub}\), the
SuperClassOf
axiom.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_subsumption_axiom |
OWLAxiom
|
Then class subsumption axiom to be verbalised. |
required |
Returns:
Type | Description |
---|---|
Tuple[CfgNode, CfgNode]
|
The verbalised sub-concept \(\mathcal{V}(C_{sub})\) and super-concept \(\mathcal{V}(C_{super})\) (order matters). |
Source code in src/deeponto/onto/verbalisation.py
400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 |
|
verbalise_class_equivalence_axiom(class_equivalence_axiom)
Verbalise a class equivalence axiom.
The equivalence axiom has the form \(C \equiv D\).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_equivalence_axiom |
OWLAxiom
|
The class equivalence axiom to be verbalised. |
required |
Returns:
Type | Description |
---|---|
Tuple[CfgNode, CfgNode]
|
The verbalised concept \(\mathcal{V}(C)\) and its equivalent concept \(\mathcal{V}(D)\) (order matters). |
Source code in src/deeponto/onto/verbalisation.py
428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 |
|
verbalise_class_assertion_axiom(class_assertion_axiom)
Verbalise a class assertion axiom.
The class assertion axiom has the form \(C(x)\).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_assertion_axiom |
OWLAxiom
|
The class assertion axiom to be verbalised. |
required |
Returns:
Type | Description |
---|---|
Tuple[CfgNode, CfgNode]
|
The verbalised class \(\mathcal{V}(C)\) and individual \(\mathcal{V}(x)\) (order matters). |
Source code in src/deeponto/onto/verbalisation.py
450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 |
|
verbalise_object_property_subsumption_axiom(object_property_subsumption_axiom)
Verbalise an object property subsumption axiom.
The subsumption axiom can have two forms:
- \(r_{sub} \sqsubseteq r_{super}\), the
SubObjectPropertyOf
axiom; - \(r_{super} \sqsupseteq r_{sub}\), the
SuperObjectPropertyOf
axiom.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
object_property_subsumption_axiom |
OWLAxiom
|
The object property subsumption axiom to be verbalised. |
required |
Returns:
Type | Description |
---|---|
Tuple[CfgNode, CfgNode]
|
The verbalised sub-property \(\mathcal{V}(r_{sub})\) and super-property \(\mathcal{V}(r_{super})\) (order matters). |
Source code in src/deeponto/onto/verbalisation.py
472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 |
|
verbalise_object_property_assertion_axiom(object_property_assertion_axiom)
Verbalise an object property assertion axiom.
The object property assertion axiom has the form \(r(x, y)\).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
object_property_assertion_axiom |
OWLAxiom
|
The object property assertion axiom to be verbalised. |
required |
Returns:
Type | Description |
---|---|
Tuple[CfgNode, CfgNode]
|
The verbalised object property \(\mathcal{V}(r)\) and two individuals \(\mathcal{V}(x)\) and \(\mathcal{V}(y)\) (order matters). |
Source code in src/deeponto/onto/verbalisation.py
508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 |
|
verbalise_object_property_domain_axiom(object_property_domain_axiom)
Verbalise an object property domain axiom.
The domain of a property \(r: X \rightarrow Y\) specifies the concept expression \(X\) of its subject.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
object_property_domain_axiom |
OWLAxiom
|
The object property domain axiom to be verbalised. |
required |
Returns:
Type | Description |
---|---|
Tuple[CfgNode, CfgNode]
|
The verbalised object property \(\mathcal{V}(r)\) and its domain \(\mathcal{V}(X)\) (order matters). |
Source code in src/deeponto/onto/verbalisation.py
532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 |
|
verbalise_object_property_range_axiom(object_property_range_axiom)
Verbalise an object property range axiom.
The range of a property \(r: X \rightarrow Y\) specifies the concept expression \(Y\) of its object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
object_property_range_axiom |
OWLAxiom
|
The object property range axiom to be verbalised. |
required |
Returns:
Type | Description |
---|---|
Tuple[CfgNode, CfgNode]
|
The verbalised object property \(\mathcal{V}(r)\) and its range \(\mathcal{V}(Y)\) (order matters). |
Source code in src/deeponto/onto/verbalisation.py
556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 |
|
OntologySyntaxParser()
A syntax parser for the OWL logical expressions, e.g., OWLAxiom
and OWLClassExpression
.
It makes use of the string representation (based on Manchester Syntax) defined in the OWLAPI. In Python,
such string can be accessed by simply using str(some_owl_object)
.
To keep the Java import in the main Ontology
class,
this parser does not deal with OWLAxiom
directly but instead its string representation.
Due to the OWLObject
syntax, this parser relies on two components:
- Parentheses matching;
- Tree construction (
RangeNode
).
As a result, it will return a RangeNode
that
specifies the sub-formulas (and their respective positions in the string representation)
in a tree structure.
Examples:
Suppose the input is an OWLAxiom
that has the string representation:
>>> str(owl_axiom)
>>> 'EquivalentClasses(<http://purl.obolibrary.org/obo/FOODON_00001707> ObjectIntersectionOf(<http://purl.obolibrary.org/obo/FOODON_00002044> ObjectSomeValuesFrom(<http://purl.obolibrary.org/obo/RO_0001000> <http://purl.obolibrary.org/obo/FOODON_03412116>)) )'
This corresponds to the following logical expression:
After apply the parser, a RangeNode
will be returned which can be rentered as:
axiom_parser = OntologySyntaxParser()
print(axiom_parser.parse(str(owl_axiom)).render_tree())
Output:
-
Root@[0:inf] └── EQV@[0:212] ├── FOODON_00001707@[6:54] └── AND@[55:210] ├── FOODON_00002044@[61:109] └── EX.@[110:209] ├── RO_0001000@[116:159] └── FOODON_03412116@[160:208]
Or, if graphviz
(installed by e.g., sudo apt install graphviz
) is available,
you can visualise the tree as an image by:
axiom_parser.parse(str(owl_axiom)).render_image()
Output:
The name for each node has the form {node_type}@[{start}:{end}]
, which means a node of the type {node_type}
is
located at the range [{start}:{end}]
in the abbreviated expression (see abbreviate_owl_expression
below).
The leaf nodes are IRIs and they are represented by the last segment (split by "/"
) of the whole IRI.
Child nodes can be accessed by .children
, the string representation of the sub-formula in this node can be
accessed by .text
. For example:
parser.parse(str(owl_axiom)).children[0].children[1].text
Output:
-
'[AND](<http://purl.obolibrary.org/obo/FOODON_00002044> [EX.](<http://purl.obolibrary.org/obo/RO_0001000> <http://purl.obolibrary.org/obo/FOODON_03412116>))'
Source code in src/deeponto/onto/verbalisation.py
670 671 |
|
abbreviate_owl_expression(owl_expression)
Abbreviate the string representations of logical operators to a fixed length (easier for parsing).
The abbreviations are specified at deeponto.onto.verbalisation.ABBREVIATION_DICT
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
owl_expression |
str
|
The string representation of an |
required |
Returns:
Type | Description |
---|---|
str
|
The modified string representation of this |
Source code in src/deeponto/onto/verbalisation.py
673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 |
|
parse(owl_expression)
Parse an OWLAxiom
into a RangeNode
.
This is the main entry for using the parser, which relies on the parse_by_parentheses
method below.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
owl_expression |
Union[str, OWLObject]
|
The string representation of an |
required |
Returns:
Type | Description |
---|---|
RangeNode
|
A parsed syntactic tree given what parentheses to be matched. |
Source code in src/deeponto/onto/verbalisation.py
689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 |
|
parse_by_parentheses(owl_expression, already_parsed=None, for_iri=False)
classmethod
Parse an OWLAxiom
based on parentheses matching into a RangeNode
.
This function needs to be applied twice to get a fully parsed RangeNode
because IRIs have
a different parenthesis pattern.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
owl_expression |
str
|
The string representation of an |
required |
already_parsed |
RangeNode
|
A partially parsed |
None
|
for_iri |
bool
|
Parentheses are by default |
False
|
Raises:
Type | Description |
---|---|
RuntimeError
|
Raised when the input axiom text is nor properly formatted. |
Returns:
Type | Description |
---|---|
RangeNode
|
A parsed syntactic tree given what parentheses to be matched. |
Source code in src/deeponto/onto/verbalisation.py
710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 |
|
RangeNode(start, end, name=None, **kwargs)
Bases: NodeMixin
A tree implementation for ranges (without partial overlap).
- Parent node's range fully covers child node's range, e.g.,
[1, 10]
is a parent of[2, 5]
. - Partial overlap between ranges are not allowed, e.g.,
[2, 4]
and[3, 5]
cannot appear in the sameRangeNodeTree
. - Non-overlap ranges are on different branches (irrelevant).
- Child nodes are ordered according to their relative positions.
Source code in src/deeponto/onto/verbalisation.py
786 787 788 789 790 791 792 793 794 795 |
|
__gt__(other)
Compare two ranges if they have a different start
and/or a different end
.
- \(R_1 \lt R_2\): if range \(R_1\) is completely contained in range \(R_2\), and \(R_1 \neq R_2\).
- \(R_1 \gt R_2\): if range \(R_2\) is completely contained in range \(R_1\), and \(R_1 \neq R_2\).
"irrelevant"
: if range \(R_1\) and range \(R_2\) have no overlap.
Warning
Partial overlap is not allowed.
Source code in src/deeponto/onto/verbalisation.py
802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 |
|
sort_by_start(nodes)
staticmethod
A sorting function that sorts the nodes by their starting positions.
Source code in src/deeponto/onto/verbalisation.py
826 827 828 829 830 |
|
insert_child(node)
Inserting a child RangeNode
.
Child nodes have a smaller (inclusive) range, e.g., [2, 5]
is a child of [1, 6]
.
Source code in src/deeponto/onto/verbalisation.py
832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 |
|
render_tree()
Render the whole tree.
Source code in src/deeponto/onto/verbalisation.py
867 868 869 |
|
render_image()
Calling this function will generate a temporary range_node.png
file
which will be displayed.
To make this visualisation work, you need to install graphviz
by, e.g.,
sudo apt install graphviz
Source code in src/deeponto/onto/verbalisation.py
871 872 873 874 875 876 877 878 879 880 881 882 |
|
Created: January 24, 2023