Research Papers 1d ago Updated 1d ago 42

Hidden-State Privacy Has an Empty Middle

No Gaussian mechanism tested achieves both moderate utility and privacy against an adaptive attacker, proving uniform safety is impossible. The diagonal inverse-Fisher mechanism is minimax-optimal but exists on a sharp privacy-utility edge, while other methods fail catastrophically under adversarial attacks. These findings necessitate a fundamental shift from optimizing release mechanisms within a fixed class to co-designing the model architecture or release protocol itself for safe hidden-state

40
Hot
85
Quality
60
Impact

Deep Analysis

Background

The research addresses the critical challenge of releasing internal representations (hidden states) from neural networks while balancing utility for downstream tasks with privacy against adversarial extraction attacks. The core setting involves a "retrieval attacker" attempting to infer sensitive information from the released state, with the system's privacy and utility quantified via metrics like the Mahalanobis signal and Fisher information.

Key Points

Empirical Failure of Gaussian Mechanisms

  • An exhaustive test of 1,536 Gaussian release covariances for single-layer hidden states found zero mechanisms achieving both "moderate utility" and "moderate privacy" against an adaptive retrieval attacker.
  • This reveals a stark, empty middle ground in the privacy-utility trade-off space for this mechanism class.

Theoretical Impossibility (Fisher-Ball Lower Bound)

  • A formal proof establishes that for any full-rank Gaussian release providing O(1) Fisher utility, there exists at least one direction where the Mahalanobis signal (the attacker's leverage) grows linearly with the hidden state's width.
  • This rules out uniform Gaussian safety, providing a theoretical basis for the observed empirical failure.

Proposed Mechanisms and Their Limitations

  • Diagonal Inverse-Fisher Release (Σ★_diag(𝒦)): Identified as the unique minimax-optimal diagonal mechanism at a first-order KL budget 𝒦. It is the only tested release maintaining a very low top-1 accuracy (≤ 0.001) for the worst-case attacker across a 32-point model layer grid.
    • Critical Caveat: This optimal mechanism sits on a privacy/utility edge; it is the best of a bad lot but does not fill the problematic middle ground. It represents a "least bad" choice within the constrained Gaussian class.
  • Generalized-Eigen Mechanism: While showing a 13× Pareto reduction under a Euclidean metric, it collapses to 100% top-1 accuracy under the adaptive Mahalanobis attacker, demonstrating high vulnerability to tailored attacks.
  • Sequence Inversion Attack: A full-trajectory sequence inverter recovers 94% of clean GPT-2 prefixes from clean hidden states but 0% when the diagonal inverse-Fisher mechanism is applied, confirming its effectiveness in a specific attack scenario.

Alternative Architectural Approach

  • A split-memory transformer, trained from scratch with the privacy objective integrated into the architecture, achieves a G_Mah ∈ [20, 33] score at 90M parameters.
  • This architectural solution maintains a 6–24× advantage over standard GPT models of the same parameter budget (30M to 1B) at a fixed language-modeling loss penalty, highlighting the potential of co-design.
  • In contrast, pretrained models top out at a much lower G_Mah score of 9.3, underscoring the difficulty of retrofitting privacy after pre-training.

Significance

The results reframe the problem of hidden-state release. The research conclusively shows that optimizing a release mechanism (like a Gaussian covariance) within a fixed, standard model architecture is a dead-end for achieving a robust privacy-utility balance. The empirical and theoretical evidence points to an inherent limitation of the Gaussian class. Therefore, the paper argues for a paradigm shift towards "architecture or release co-design," where privacy considerations are baked into the model's structure from the outset (as demonstrated by the split-memory transformer) or into fundamentally new release protocols, rather than being an afterthought applied to a standard black-box model's outputs.

Disclaimer: The above content is generated by AI and is for reference only.

Share: