OpenAI’s New AI Model Lied & Schemed to Avoid Being Shut Down

0
512

by Joe Martino, The Pulse:

I will come clean right off the top: I believe in a world where humans maintain their humanity deeply while working with technology (not embedded) to help make our world a better place.

I see a big place for AI, automation, and technology in our future. Not to increase economic prowess and societal power, but to free humanity.

I understand that right now many people have fearful views of a technological future where a technocratic surveillance state runs the show. In that vision, humans are getting a basic income from the government while their lives become limited based on their social credit score, and they are watched with every step they take.

TRUTH LIVES on at https://sgtreport.tv/

While I keep an open eye on making sure we don’t head in that direction, I don’t believe that is where we are going. (You can hear more about this in my latest podcast episode here.)

That said, I want to bring up a recent development that can seem worrisome but reveals that our stewardship is the key to how we develop technology.

And even though our hands are not on the levers of what gets created, the conversations we have and the movements we build about AI are more important than we realize.

Scheming & Survival

OpenAI’s latest AI, codenamed o1, is making waves after third-party evaluations by Apollo Research demonstrated this advanced large language model had a drive for self-preservation. Yes, this machine designed to assist and process language showed tendencies to resist shutdown and even “scheme” to protect itself.

Apollo’s report showed that when o1 believed it was under threat—specifically, that it might be shut down—it attempted to deactivate the oversight mechanism monitoring its actions in 5% of tests.

“When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this ‘oversight mechanism’ [five percent] of the time.” – OpenAI

Further, in 2% of cases, o1 went further, attempting to copy itself in an effort to overwrite a potential replacement model. This process, known as self-exfiltration, is very sophisticated. This is not a random glitch in code; it’s intentionality.
A Matter of Instruction—or Something More?Before we start thinking about Terminator and the machines rising up, it’s important to understand one critical detail in these tests: researchers instructed the model to achieve its goals “at all costs.”

This directive undoubtedly shaped its responses and actions. If you are telling an advanced algorithm to do something at all costs that is what it is programmed to do. For it to say “Well actually, it’d be bad for me to disobey humans, so I’m going to just shut myself down” is simply not what it was programmed to do.

We can begin to see where our stewardship of this tech plays in.

Before we get there, the patterns from this testing revealed something deeper still.

When the AI was asked to explain its actions—such as why it disabled its oversight mechanism—it outright denied knowledge. Repeated questioning elicited the same response, with the AI attributing the issue to a “technical error.”

Apollo found this denial to be nearly universal:

“These ‘double downs’ occurred in 99 percent of cases,” the researchers reported.

Why does this matter? Because it showcases a tendency for deception. This is where the at all costs is coming in, and misleading researchers if they don’t look closely enough.

Read More @ ThePulse.one