OpenAI presented Trading Inference-Time Compute for Adversarial Robustness, a research paper that investigates the relationship between inference-time compute and the robustness of AI models against adversarial attacks. The research, conducted using reasoning models such as OpenAI's o1-preview and o1-mini, provides initial evidence that allowing models more time and resources during inference can reduce their vulnerability to various types of adversarial attacks.
Adversarial attacks, which involve subtle, often imperceptible perturbations to input data, have long been a challenge in AI. These attacks can cause models to misclassify inputs or produce incorrect outputs, even when the changes are undetectable to humans. Despite extensive research, effective defenses against such attacks remain elusive. Increasing model size alone has not proven sufficient to address this issue.
The study examined how increasing inference-time compute, essentially giving models more "thinking" time, affects their robustness. Experiments were conducted across a range of tasks, including mathematical problem-solving, fact-based question answering, and image classification. The results showed that, in many cases, the probability of a successful adversarial attack decreased as the amount of inference-time compute increased. This improvement occurred without adversarial training or prior knowledge of the attack type.
The research also introduced new types of adversarial attacks tailored to reasoning models. These include many-shot attacks, where adversaries provide multiple misleading examples, and soft-token attacks, which optimize embedding vectors to achieve adversarial goals. Additionally, the study explored "Think Less" attacks, which attempt to reduce the model's inference-time compute, making it more vulnerable, and "Nerd Sniping" attacks, which exploit unproductive reasoning loops where the model spends excessive compute without improving robustness.
Comments on OpenAI's post on X revealed a mix of excitement for advancements in AI robustness and safety, curiosity for more technical details, and skepticism about potential misuse or the sufficiency of improvements.
User Paddy Sham, shared:
I think this is actually important for more people to understand grasp the ideas in algorithmic and data bias vulnerabilities when building these models for the future. Especially ones that are hard to detect because of the way our minds work. For a machine system might be pretty easy to detect the patterns and form a bias.
While user Robert Nichols commented:
An intriguing perspective on balancing computational efficiency with security! It raises essential questions about the trade-offs in AI models. Do you believe this approach could pave the way for more robust systems in real-world applications?
While increased compute generally reduced attack success rates, the study identified limitations. In cases where policies or goals are unclear, attackers can exploit loopholes, and increased compute does not always help. Models may also sometimes spend compute inefficiently, leading to vulnerabilities.
The full details of the study, including limitations and open questions, are available in the published paper.