Topical Text Classification of Russian News: a Comparison of BERT and Standard Models
The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora and eight news topics show that the BERT model is superior to standard ones, a...
Saved in:
Published in: | 2022 31st Conference of Open Innovations Association (FRUCT) Vol. 31; no. 1; pp. 160 - 166 |
---|---|
Main Author: | |
Format: | Conference Proceeding Journal Article |
Language: | English |
Published: |
FRUCT Oy
27-04-2022
FRUCT |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora and eight news topics show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Error analysis reveals the best classified topics: "economics", "culture", and "media". Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian. |
---|---|
ISSN: | 2305-7254 2343-0737 |
DOI: | 10.23919/FRUCT54823.2022.9770920 |