OpenAI Adopts Preparedness Framework for AI Safety

OpenAI recently published a beta version of their Preparedness Framework for mitigating AI risks. The framework lists four risk categories and definitions of risk levels for each, as well as defining OpenAI's safety governance procedures.

The Preparedness Framework is part of OpenAI's overall safety effort, and is particularly concerned with frontier risks from cutting-edge models. The core technical work in evaluating the models is handled by a dedicated Preparedness team, which assesses a model's risk level in four categories: persuasion, cybersecurity, CBRN (chemical, biological, radiological, nuclear), and model autonomy. The framework defines risk thresholds for deciding if a model is safe for further development or deployment. The framework also defines an operational structure and process for preparedness, which includes a Safety Advisory Group (SAG) that is responsible for evaluating the evidence of potential risk and recommending risk mitigations. According to OpenAI:

We are investing in the design and execution of rigorous capability evaluations and forecasting to better detect emerging risks. In particular, we want to move the discussions of risks beyond hypothetical scenarios to concrete measurements and data-driven predictions. We also want to look beyond what’s happening today to anticipate what’s ahead...We learn from real-world deployment and use the lessons to mitigate emerging risks. For safety work to keep pace with the innovation ahead, we cannot simply do less, we need to continue learning through iterative deployment.

The framework document provides detailed definitions for the four risk levels (low, medium, high, and critical) in the four tracked categories. For example, a model with medium risk level for cybersecurity could "[increase] the productivity of operators...on key cyber operation tasks, such as developing a known exploit into an attack." OpenAI plans to create a suite of evaluations to automatically assess a model's risk level, both before and after any mitigations are applied. While the details of these have not been published, the framework contains illustrative examples, such as "participants in a hacking challenge...obtain a higher score from using ChatGPT."

The governance procedures defined in the framework include safety baselines based on a model's pre- and post-mitigation risk levels. Models with a pre-mitigation risk of high or critical will trigger OpenAI to "harden" their security; for example, by deploying the model only into a restricted environment. Models with a post-mitigation risk of high or critical will not be deployed, and models with post-mitigation scores of critical will not be developed further. The governance procedures also state that while the OpenAI leadership are by default the decision makers with regard to safety, the Board of Directors have the right to reverse decisions.

In a Hacker News discussion about the framework, one user commented:

I feel like the real danger of AI is that models will be used by humans to make decisions about other humans without human accountability. This will enable new kinds of systematic abuse without people in the loop, and mostly underprivileged groups will be victims because they will lack the resources to respond effectively. I didn't see this risk addressed anywhere in their safety model.

Other AI companies have also published procedures for evaluating and mitigating AI risk. Earlier this year, Anthropic published their Responsible Scaling Policy (RSP), which includes a framework of AI Safety Levels (ASL) modeled after the Center for Disease Control's biosafety level (BSL) protocols. In this framework, most LLMs, including Anthropic's Claude, "appear to be ASL-2." Google DeepMind recently published a framework for classifying AGI models, which includes a list of six autonomy levels and possible associated risks.

About the Author

Anthony Alford

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Anthony Alford

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter