Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms
We describe the current content moderation strategy employed by Meta to remove policy-violating content from its platforms. Meta relies on both handcrafted and learned risk models to flag potentially violating content for human review. Our approach aggregates these risk models into a single ranking...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
11-11-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We describe the current content moderation strategy employed by Meta to
remove policy-violating content from its platforms. Meta relies on both
handcrafted and learned risk models to flag potentially violating content for
human review. Our approach aggregates these risk models into a single ranking
score, calibrating them to prioritize more reliable risk models. A key
challenge is that violation trends change over time, affecting which risk
models are most reliable. Our system additionally handles production challenges
such as changing risk models and novel risk models. We use a contextual bandit
to update the calibration in response to such trends. Our approach increases
Meta's top-line metric for measuring the effectiveness of its content
moderation strategy by 13%. |
---|---|
DOI: | 10.48550/arxiv.2211.06516 |