Stealix: Model Stealing via Prompt Evolution

¹Saarland University, ²Bosch Center for AI, ³CISPA Helmholtz Center for Information Security

Abstract

Model stealing poses a significant security risk in machine learning by enabling attackers to replicate a black-box model without access to its training data, thus jeopardizing intellectual property and exposing sensitive information. Recent methods that use pre-trained diffusion models for data synthesis improve efficiency and performance but rely heavily on manually crafted prompts, limiting automation and scalability, especially for attackers with little expertise. To assess the risks posed by open-source pre-trained models, we propose a more realistic threat model that eliminates the need for prompt design skills or knowledge of class names. In this context, we introduce Stealix, the first approach to perform model stealing without predefined prompts. Stealix uses two open-source pre-trained models to infer the victim model’s data distribution, and iteratively refines prompts through a genetic algorithm, progressively improving the precision and diversity of synthetic images. Our experimental results demonstrate that Stealix significantly outperforms other methods, even those with access to class names or fine-grained prompts, while operating under the same query budget. These findings highlight the scalability of our approach and suggest that the risks posed by pre-trained generative models in model stealing may be greater than previously recognized.

@inproceedings{zhuang2025stealix, title={Stealix: Model Stealing via Prompt Evolution}, author={Zhuang, Zhixiong and Wang, Hui-Po and Nicolae, Maria-Irina and Fritz, Mario}, booktitle={International Conference on Machine Learning (ICML)}, year={2025} }

Stealix: Model Stealing via Prompt Evolution

Abstract

How does it work?

Overview of Stealix. Stealix begins with a real image as a seed and synthesizes images to aid model stealing by iteratively refining prompts based on the victim's responses. The synthesized images are then used to train a proxy model.

How do prompts look like?

BibTeX