Power-Softmax: Towards Secure LLM Inference over Encrypted Data
Modern cryptographic methods for implementing privacy-preserving LLMs such as Homomorphic Encryption (HE) require the LLMs to have a polynomial form. Forming such a representation is challenging because Transformers include non-polynomial components, such as Softmax and layer normalization. Previous...
Saved in:
Main Authors: | , , , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
12-10-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Modern cryptographic methods for implementing privacy-preserving LLMs such as
Homomorphic Encryption (HE) require the LLMs to have a polynomial form. Forming
such a representation is challenging because Transformers include
non-polynomial components, such as Softmax and layer normalization. Previous
approaches have either directly approximated pre-trained models with
large-degree polynomials, which are less efficient over HE, or replaced
non-polynomial components with easier-to-approximate primitives before
training, e.g., Softmax with pointwise attention. The latter approach might
introduce scalability challenges.
We present a new HE-friendly variant of self-attention that offers a stable
form for training and is easy to approximate with polynomials for secure
inference. Our work introduces the first polynomial LLMs with 32 layers and
over a billion parameters, exceeding the size of previous models by more than
tenfold. The resulting models demonstrate reasoning and in-context learning
(ICL) capabilities comparable to standard transformers of the same size,
representing a breakthrough in the field. Finally, we provide a detailed
latency breakdown for each computation over encrypted data, paving the way for
further optimization, and explore the differences in inductive bias between
transformers relying on our HE-friendly variant and standard transformers. Our
code is attached as a supplement. |
---|---|
DOI: | 10.48550/arxiv.2410.09457 |