Corpus Poisoning via Approximate Greedy Gradient Descent
Dense retrievers are widely used in information retrieval and have also been successfully extended to other knowledge intensive areas such as language models, e.g., Retrieval-Augmented Generation (RAG) systems. Unfortunately, they have recently been shown to be vulnerable to corpus poisoning attacks...
Saved in:
Main Authors: | , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
07-06-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Dense retrievers are widely used in information retrieval and have also been
successfully extended to other knowledge intensive areas such as language
models, e.g., Retrieval-Augmented Generation (RAG) systems. Unfortunately, they
have recently been shown to be vulnerable to corpus poisoning attacks in which
a malicious user injects a small fraction of adversarial passages into the
retrieval corpus to trick the system into returning these passages among the
top-ranked results for a broad set of user queries. Further study is needed to
understand the extent to which these attacks could limit the deployment of
dense retrievers in real-world applications. In this work, we propose
Approximate Greedy Gradient Descent (AGGD), a new attack on dense retrieval
systems based on the widely used HotFlip method for efficiently generating
adversarial passages. We demonstrate that AGGD can select a higher quality set
of token-level perturbations than HotFlip by replacing its random token
sampling with a more structured search. Experimentally, we show that our method
achieves a high attack success rate on several datasets and using several
retrievers, and can generalize to unseen queries and new domains. Notably, our
method is extremely effective in attacking the ANCE retrieval model, achieving
attack success rates that are 15.24\% and 17.44\% higher on the NQ and MS MARCO
datasets, respectively, compared to HotFlip. Additionally, we demonstrate
AGGD's potential to replace HotFlip in other adversarial attacks, such as
knowledge poisoning of RAG systems. |
---|---|
DOI: | 10.48550/arxiv.2406.05087 |