Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale

We present the Gab Hate Corpus (GHC), consisting of 27,665 posts from the social network service gab.com, each annotated for the presence of “hate-based rhetoric” by a minimum of three annotators. Posts were labeled according to a coding typology derived from a synthesis of hate speech definitions a...

Full description

Saved in:
Bibliographic Details
Published in:Language resources and evaluation Vol. 56; no. 1; pp. 79 - 108
Main Authors: Kennedy, Brendan, Atari, Mohammad, Davani, Aida Mostafazadeh, Yeh, Leigh, Omrani, Ali, Kim, Yehsong, Coombs, Kris, Havaldar, Shreya, Portillo-Wightman, Gwenyth, Gonzalez, Elaine, Hoover, Joe, Azatian, Aida, Hussain, Alyzeh, Lara, Austin, Cardenas, Gabriel, Omary, Adam, Park, Christina, Wang, Xin, Wijaya, Clarisa, Zhang, Yong, Meyerowitz, Beth, Dehghani, Morteza
Format: Journal Article
Language:English
Published: Dordrecht Springer Netherlands 01-03-2022
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present the Gab Hate Corpus (GHC), consisting of 27,665 posts from the social network service gab.com, each annotated for the presence of “hate-based rhetoric” by a minimum of three annotators. Posts were labeled according to a coding typology derived from a synthesis of hate speech definitions across legal precedent, previous hate speech coding typologies, and definitions from psychology and sociology, comprising hierarchical labels indicating dehumanizing and violent speech as well as indicators of targeted groups and rhetorical framing. We provide inter-annotator agreement statistics and perform a classification analysis in order to validate the corpus and establish performance baselines. The GHC complements existing hate speech datasets in its theoretical grounding and by providing a large, representative sample of richly annotated social media posts.
ISSN:1574-020X
1574-0218
DOI:10.1007/s10579-021-09569-x