Hierarchy Transformers

Figure 1: Illustration of how hierarchies are explicitly encoded in HiTs. The square ($d$-dimensional hyper-cube) refers to the output embedding space of transformer encoder-based LMs whose final activation function is typically $\tanh$, and the circumscribed circle ($d$-dimensional hyper-sphere) refers to the Poincaré ball of radius $d$. The distance and norm metrics involved in our hyperbolic losses are defined w.r.t. this manifold.

Figure 2: Illustration of the impact of the loss function of HiTs during training. In Euclidean space, it seems contradictory that both "phone" and "computer" are pulled towards "e-device" but are also pushed away from each other. However, in principle, this is not a problem in hyperbolic space, where distances increase exponentially relative to Euclidean distances as one moves from the origin to the boundary of the manifold.

About

Hierarchy Transformer (HiT) is a framework that enables transformer encoder-based language models (LMs) to learn hierarchical structures in hyperbolic space. The main idea is to construct a Poincaré ball that directly circumscribes the output embedding space of LMs,leveraging the exponential expansion of hyperbolic space to organise entity embeddings hierarchically. In addition to presenting this framework (see code on GitHub), we are committed to training and releasing HiT models across various hierachiies. The models and datasets will be accessible on HuggingFace.

Get Started

The code repository of Hierarchy Transformers primarily extends from Sentence Transformers, the main HiT model class HierarchyTransformer inherits SentenceTransformer. It can be loaded with the following code:

from hierarchy_transformers import HierarchyTransformer

# load the model
model = HierarchyTransformer.from_pretrained('Hierarchy-Transformers/HiT-MiniLM-L12-WordNetNoun')

It is possible to load to different model variant with the revision parameter. For example:

# load the model with a particular revision
model = HierarchyTransformer.from_pretrained('Hierarchy-Transformers/HiT-MiniLM-L12-WordNetNoun', revision="v1-hard-negatives")

To batch encode a list of entity names:

# entity names to be encoded.
entity_names = ["computer", "personal computer", "fruit", "berry"]

# get the entity embeddings
entity_embeddings = model.encode(entity_names)

Use the entity embeddings to predict the subsumption relationships between them:

# suppose we want to compare "personal computer" and "computer", "berry" and "fruit"
child_entity_embeddings = model.encode(["personal computer", "berry"], convert_to_tensor=True)
parent_entity_embeddings = model.encode(["computer", "fruit"], convert_to_tensor=True)

# compute the hyperbolic distances and norms of entity embeddings
dists = model.manifold.dist(child_entity_embeddings, parent_entity_embeddings)
child_norms = model.manifold.dist0(child_entity_embeddings)
parent_norms = model.manifold.dist0(parent_entity_embeddings)

# use the empirical function for subsumption prediction proposed in the paper
# `centri_score_weight` and the overall threshold are determined on the validation set
subsumption_scores = - (dists + centri_score_weight * (parent_norms - child_norms))

Citation

Our paper has been accepted at NeurIPS 2024.

@inproceedings{NEURIPS2024_1a970a3e,
 author = {He, Yuan and Yuan, Moy and Chen, Jiaoyan and Horrocks, Ian},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
 pages = {14690--14711},
 publisher = {Curran Associates, Inc.},
 title = {Language Models as Hierarchy Encoders},
 url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/1a970a3e62ac31c76ec3cea3a9f68fdf-Paper-Conference.pdf},
 volume = {37},
 year = {2024}
}

Hierarchy Transformers (HiTs)

About

Get Started

Citation