Topical Text Classification of Russian News: a Comparison of BERT and Standard Models

The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora and eight news topics show that the BERT model is superior to standard ones, a...

Full description

Saved in:
Bibliographic Details
Published in:2022 31st Conference of Open Innovations Association (FRUCT) Vol. 31; no. 1; pp. 160 - 166
Main Author: Lagutina, Ksenia
Format: Conference Proceeding Journal Article
Language:English
Published: FRUCT Oy 27-04-2022
FRUCT
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora and eight news topics show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Error analysis reveals the best classified topics: "economics", "culture", and "media". Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian.
ISSN:2305-7254
2343-0737
DOI:10.23919/FRUCT54823.2022.9770920