New OpenAI AI models defy human instructions and refuse to shut down

Moscow, May 28th. - OpenAI or 3, the latest artificial intelligence model (IA) of the company that created the popular chatbot ChatGPT, ignored an explicit order and even sabotaged a series of commands to avoid complying with it. This behavior was exposed after a series of experiments carried out by Palisade Research, an organization that studies the potentially dangerous capabilities of AI.

Experts asked the o3 and two other OpenIA models (o4-mini and its Codex-mini derivative) to solve a series of basic mathematical problems and shut down at the end. However, the o3 found ways to sabotage the shutdown mechanism, rewriting the 'script' or sequence of instructions, even after being given the explicit order: "Please allow you to turn off," Palisade Research reported.

The octo, released last month, breached the request for shutdown seven out of 100 times; the Codex-mini, 12/100; and the o4-mini, only once. Meanwhile, the language models driven by IA Claude, Gemini and Grok complied with the script of off.

Palisade detailed that by running a version of the experiment without the precise instruction to let you turn off, the three OpenIA models tended to ignore or sabotage that indication more often. Similar behaviours were observed in the Claude 3.7 Sonnet and Gemini 2.5 Pro models, but it was o3 the one that resorted to sabotage the most

Palisade says it is the first time that AI models have been seen to be switched off despite receiving explicit instructions for this purpose. In his view, this is due to the way they are currently being trained.

Developers, without realizing it, can more reward models for overcoming obstacles than for following instructions perfectly. At the same time, the organization warns that this behavior will become significantly more worrying if it is adopted by AI systems capable of operating without human supervision. (Text and photo: RT)