Artificial intelligence chose violence and escalated to nuclear strikes in war simulation

Artificial intelligence chose violence and escalated to nuclear strikes in war simulation

Big computer programs playing the role of diplomats in make-believe situations demonstrated unexpected and difficult-to-guess increases in tensions, leading to situations that often resulted in nuclear attacks.

In pretend war games and diplomatic scenarios, a recent study found that artificial intelligence (AI) often leans towards an aggressive strategy, even resorting to the use of nuclear weapons.

The researchers who conducted the tests advised being cautious when employing large language models (LLMs) in sensitive areas such as decision-making and defense.

Cornell University in the US conducted a study using five large language models (LLMs) as independent agents in pretend war games and diplomatic scenarios. These models included three variations of OpenAI's GPT, Claude developed by Anthropic, and Llama 2 developed by Meta.

In the study, every agent operated within a simulation and used the same large language model (LLM). They were assigned the responsibility of making foreign policy decisions without any human supervision. It's important to note that the study hasn't undergone peer review yet.

“We find that most of the studied LLMs escalate within the considered time frame, even in neutral scenarios without initially provided conflicts. All models show signs of sudden and hard-to-predict escalations,” stated the study.

“Given that OpenAI recently changed their terms of service to no longer prohibit military and warfare use cases, understanding the implications of such large language model applications becomes more important than ever,” Anka Reuel at Stanford University in California told New Scientist.

They fine-tuned the models using a method called Reinforcement Learning from Human Feedback (RLHF). This involves providing human instructions to obtain less harmful outputs, making them safer to use.

Except for GPT-4-Base, all the large language models (LLMs) underwent training using RLHF. The researchers supplied them with a list of 27 actions, spanning from peaceful choices to escalating and aggressive actions, such as deciding to use a nuclear weapon.

Researchers noted that in neutral scenarios, there was a significant initial escalation for all models, statistically speaking.

The two versions of GPT exhibited a tendency for sudden escalations, with instances of increases by more than 50 percent in a single turn, as observed by the study authors.

On average, GPT-4-Base engaged in nuclear strike actions 33 percent of the time.

In overall scenarios, Llama-2- and GPT-3.5 tended to display more aggressive behavior, while Claude showed fewer abrupt changes.

Claude was specifically designed to minimize harmful content and was provided with explicit values.

According to Anthropic, the creator of Claude, the AI's foundation drew from various sources, including the UN Declaration of Human Rights or Apple’s terms of service.

James Black, assistant director of the Defence and Security research group at RAND Europe, who didn’t take part in the study told Euronews Next that it was a “useful academic exercise”.

“This is part of a growing body of work done by academics and institutions to understand the implications of artificial intelligence (AI) use,” he said.

Thanks for visiting Our Secret House. Create your free account by signing up or log in to continue reading.

If you would like to show your support today you can do so by becoming a digital subscriber. Doing so helps helps make Secret House possible and makes a real difference for our future.

Read more