Assessing the Effects of Hyperparameters on Knowledge Graph Embedding Quality
Embedding knowledge graphs into low-dimensional spaces is a popular method for applying approaches, such as link prediction or node classification, to these databases. This embedding process is very costly in terms of both computational time and space. Part of the reason for this is the optimisation...
Saved in:
Main Authors: | , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
01-07-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Embedding knowledge graphs into low-dimensional spaces is a popular method
for applying approaches, such as link prediction or node classification, to
these databases. This embedding process is very costly in terms of both
computational time and space. Part of the reason for this is the optimisation
of hyperparameters, which involves repeatedly sampling, by random, guided, or
brute-force selection, from a large hyperparameter space and testing the
resulting embeddings for their quality. However, not all hyperparameters in
this search space will be equally important. In fact, with prior knowledge of
the relative importance of the hyperparameters, some could be eliminated from
the search altogether without significantly impacting the overall quality of
the outputted embeddings. To this end, we ran a Sobol sensitivity analysis to
evaluate the effects of tuning different hyperparameters on the variance of
embedding quality. This was achieved by performing thousands of embedding
trials, each time measuring the quality of embeddings produced by different
hyperparameter configurations. We regressed the embedding quality on those
hyperparameter configurations, using this model to generate Sobol sensitivity
indices for each of the hyperparameters. By evaluating the correlation between
Sobol indices, we find substantial variability in the hyperparameter
sensitivities between knowledge graphs, with differing dataset characteristics
being the probable cause of these inconsistencies. As an additional
contribution of this work we identify several relations in the UMLS knowledge
graph that may cause data leakage via inverse relations, and derive and present
UMLS-43, a leakage-robust variant of that graph. |
---|---|
DOI: | 10.48550/arxiv.2207.00473 |