Embedding With Preservation of Semantics of the Original Data

In the modern world, the data used to describe objects is often presented as sparse vectors with a large number of features. Working with them can be computationally inefficient, and often leads to overfitting; therefore, the data dimension reduction algorithms are used, one of which is auto encoder...

Full description

Saved in:

Bibliographic Details
Published in:	Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki Vol. 20; no. 2; pp. 46 - 52
Main Authors:	Vatkin, M. E., Vorobey, D. A., Yakovlev, M. V., Krivova, M. G.
Format:	Journal Article
Language:	English Russian
Published:	Educational institution «Belarusian State University of Informatics and Radioelectronics 05-04-2022
Subjects:	autoencoder data embedding linear space loss function machine learning vector
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In the modern world, the data used to describe objects is often presented as sparse vectors with a large number of features. Working with them can be computationally inefficient, and often leads to overfitting; therefore, the data dimension reduction algorithms are used, one of which is auto encoders. In this article, we propose a new approach for evaluating the properties of the obtained vectors of lower dimension, as well as a loss function based on this approach. The idea of the suggested loss function is to evaluate the quality of preserving the semantic structure in the embedding space, and to add that metric to loss function to save object relations in the embedding space and thus save more useful information about objects. The results obtained show that using a combination of the mean squared loss function together with the suggested one allows to improve the quality of the embeddings.
ISSN:	1729-7648 2708-0382
DOI:	10.35596/1729-7648-2022-20-2-46-52