Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale
We present the Gab Hate Corpus (GHC), consisting of 27,665 posts from the social network service gab.com, each annotated for the presence of “hate-based rhetoric” by a minimum of three annotators. Posts were labeled according to a coding typology derived from a synthesis of hate speech definitions a...
Saved in:
Published in: | Language resources and evaluation Vol. 56; no. 1; pp. 79 - 108 |
---|---|
Main Authors: | , , , , , , , , , , , , , , , , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Dordrecht
Springer Netherlands
01-03-2022
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We present the Gab Hate Corpus (GHC), consisting of 27,665 posts from the social network service gab.com, each annotated for the presence of “hate-based rhetoric” by a minimum of three annotators. Posts were labeled according to a coding typology derived from a synthesis of hate speech definitions across legal precedent, previous hate speech coding typologies, and definitions from psychology and sociology, comprising hierarchical labels indicating dehumanizing and violent speech as well as indicators of targeted groups and rhetorical framing. We provide inter-annotator agreement statistics and perform a classification analysis in order to validate the corpus and establish performance baselines. The GHC complements existing hate speech datasets in its theoretical grounding and by providing a large, representative sample of richly annotated social media posts. |
---|---|
ISSN: | 1574-020X 1574-0218 |
DOI: | 10.1007/s10579-021-09569-x |