Stealix: Model Stealing via Prompt Evolution

1Saarland University, 2Bosch Center for AI, 3CISPA Helmholtz Center for Information Security
MY ALT TEXT

Synthetic images generated from human-crafted prompts (simulated using InstructBLIP) do not reflect the victim data (EuroSAT). In contrast, our Stealix optimizes prompts using the victim model's predictions, producing images that resemble the victim data and are correctly classified as the target class by the victim model.

Abstract

Model stealing poses a significant security risk in machine learning by enabling attackers to replicate a black-box model without access to its training data, thus jeopardizing intellectual property and exposing sensitive information. Recent methods that use pre-trained diffusion models for data synthesis improve efficiency and performance but rely heavily on manually crafted prompts, limiting automation and scalability, especially for attackers with little expertise. To assess the risks posed by open-source pre-trained models, we propose a more realistic threat model that eliminates the need for prompt design skills or knowledge of class names. In this context, we introduce Stealix, the first approach to perform model stealing without predefined prompts. Stealix uses two open-source pre-trained models to infer the victim model’s data distribution, and iteratively refines prompts through a genetic algorithm, progressively improving the precision and diversity of synthetic images. Our experimental results demonstrate that Stealix significantly outperforms other methods, even those with access to class names or fine-grained prompts, while operating under the same query budget. These findings highlight the scalability of our approach and suggest that the risks posed by pre-trained generative models in model stealing may be greater than previously recognized.

How does it work?

MY ALT TEXT

Overview of Stealix. Stealix begins with a real image as a seed and synthesizes images to aid model stealing by iteratively refining prompts based on the victim's responses. The synthesized images are then used to train a proxy model.

How do prompts look like?

MY ALT TEXT

Seed images and corresponding prompts generated by InstructBLIP and Stealix for the EuroSAT dataset. Each pair shows the original seed image and the prompt used for image synthesis. Class names from top to bottom, left to right: AnnualCrop, Forest, HerbaceousVegetation, Highway, Industrial, Pasture, PermanentCrop, Residential, River, SeaLake. Feature words related to each class are highlighted in red for Stealix.

BibTeX

@inproceedings{zhuang2024stealthy,
        title={Stealix: Model Stealing via Prompt Evolution},
        author={Zhuang, Zhixiong and Wang, Hui-Po and Nicolae, Maria-Irina and Fritz, Mario},
        booktitle={International Conference on Machine Learning (ICML)},
        year={2025}
}