Siamese Networks for Large-Scale Author Identification
Authorship attribution is the process of identifying the author of a text. Approaches to tackling it have been conventionally divided into classification-based ones, which work well for small numbers of candidate authors, and similarity-based methods, which are applicable for larger numbers of autho...
Saved in:
Main Authors: | , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
22-12-2019
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Authorship attribution is the process of identifying the author of a text.
Approaches to tackling it have been conventionally divided into
classification-based ones, which work well for small numbers of candidate
authors, and similarity-based methods, which are applicable for larger numbers
of authors or for authors beyond the training set; these existing
similarity-based methods have only embodied static notions of similarity. Deep
learning methods, which blur the boundaries between classification-based and
similarity-based approaches, are promising in terms of ability to learn a
notion of similarity, but have previously only been used in a conventional
small-closed-class classification setup.
Siamese networks have been used to develop learned notions of similarity in
one-shot image tasks, and also for tasks of mostly semantic relatedness in NLP.
We examine their application to the stylistic task of authorship attribution on
datasets with large numbers of authors, looking at multiple energy functions
and neural network architectures, and show that they can substantially
outperform previous approaches. |
---|---|
DOI: | 10.48550/arxiv.1912.10616 |