UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI
Exact unlearning was first introduced as a privacy mechanism that allowed a user to retract their data from machine learning models on request. Shortly after, inexact schemes were proposed to mitigate the impractical costs associated with exact unlearning. More recently unlearning is often discussed...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
27-06-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Exact unlearning was first introduced as a privacy mechanism that allowed a
user to retract their data from machine learning models on request. Shortly
after, inexact schemes were proposed to mitigate the impractical costs
associated with exact unlearning. More recently unlearning is often discussed
as an approach for removal of impermissible knowledge i.e. knowledge that the
model should not possess such as unlicensed copyrighted, inaccurate, or
malicious information. The promise is that if the model does not have a certain
malicious capability, then it cannot be used for the associated malicious
purpose. In this paper we revisit the paradigm in which unlearning is used for
in Large Language Models (LLMs) and highlight an underlying inconsistency
arising from in-context learning. Unlearning can be an effective control
mechanism for the training phase, yet it does not prevent the model from
performing an impermissible act during inference. We introduce a concept of
ununlearning, where unlearned knowledge gets reintroduced in-context,
effectively rendering the model capable of behaving as if it knows the
forgotten knowledge. As a result, we argue that content filtering for
impermissible knowledge will be required and even exact unlearning schemes are
not enough for effective content regulation. We discuss feasibility of
ununlearning for modern LLMs and examine broader implications. |
---|---|
DOI: | 10.48550/arxiv.2407.00106 |