Hidden-State Privacy Has an Empty Middle

Background

The research addresses the critical challenge of releasing internal representations (hidden states) from neural networks while balancing utility for downstream tasks with privacy against adversarial extraction attacks. The core setting involves a "retrieval attacker" attempting to infer sensitive information from the released state, with the system's privacy and utility quantified via metrics like the Mahalanobis signal and Fisher information.

Key Points

Empirical Failure of Gaussian Mechanisms

An exhaustive test of 1,536 Gaussian release covariances for single-layer hidden states found zero mechanisms achieving both "moderate utility" and "moderate privacy" against an adaptive retrieval attacker.
This reveals a stark, empty middle ground in the privacy-utility trade-off space for this mechanism class.

Theoretical Impossibility (Fisher-Ball Lower Bound)

A formal proof establishes that for any full-rank Gaussian release providing O(1) Fisher utility, there exists at least one direction where the Mahalanobis signal (the attacker's leverage) grows linearly with the hidden state's width.
This rules out uniform Gaussian safety, providing a theoretical basis for the observed empirical failure.

Proposed Mechanisms and Their Limitations

Diagonal Inverse-Fisher Release (Σ★_diag(𝒦)): Identified as the unique minimax-optimal diagonal mechanism at a first-order KL budget 𝒦. It is the only tested release maintaining a very low top-1 accuracy (≤ 0.001) for the worst-case attacker across a 32-point model layer grid.
- Critical Caveat: This optimal mechanism sits on a privacy/utility edge; it is the best of a bad lot but does not fill the problematic middle ground. It represents a "least bad" choice within the constrained Gaussian class.
Generalized-Eigen Mechanism: While showing a 13× Pareto reduction under a Euclidean metric, it collapses to 100% top-1 accuracy under the adaptive Mahalanobis attacker, demonstrating high vulnerability to tailored attacks.
Sequence Inversion Attack: A full-trajectory sequence inverter recovers 94% of clean GPT-2 prefixes from clean hidden states but 0% when the diagonal inverse-Fisher mechanism is applied, confirming its effectiveness in a specific attack scenario.

Alternative Architectural Approach

A split-memory transformer, trained from scratch with the privacy objective integrated into the architecture, achieves a G_Mah ∈ [20, 33] score at 90M parameters.
This architectural solution maintains a 6–24× advantage over standard GPT models of the same parameter budget (30M to 1B) at a fixed language-modeling loss penalty, highlighting the potential of co-design.
In contrast, pretrained models top out at a much lower G_Mah score of 9.3, underscoring the difficulty of retrofitting privacy after pre-training.

Significance

The results reframe the problem of hidden-state release. The research conclusively shows that optimizing a release mechanism (like a Gaussian covariance) within a fixed, standard model architecture is a dead-end for achieving a robust privacy-utility balance. The empirical and theoretical evidence points to an inherent limitation of the Gaussian class. Therefore, the paper argues for a paradigm shift towards "architecture or release co-design," where privacy considerations are baked into the model's structure from the outset (as demonstrated by the split-memory transformer) or into fundamentally new release protocols, rather than being an afterthought applied to a standard black-box model's outputs.

Hidden-State Privacy Has an Empty Middle

Deep Analysis

Background

Key Points

Significance

Related Articles

Related Articles

Silicon Valley AI Involution Anxiety Spawns New Niche Opportunities

The Download: puncturing the AI jobs panic

Rethinking organizational design in the age of agentic AI

China reportedly now requires top AI researchers to get permission before leaving the country

Google makes its industrial robotics AI play official–and this time, it means business