How LLM Agents Play the Prisoner’s Dilemma

Anna Alexandra Grigoryan
5 min readDec 14, 2024

--

Can AI Learn to Trust?

What happens when AI agents face situations where trust, betrayal, and uncertainty drive outcomes? Can they adapt and learn cooperation when stakes are high? These questions lie at the heart of game theory, and the Prisoner’s Dilemma provides the perfect framework for exploration.

The traditional Prisoner’s Dilemma forces two players to decide between cooperating (staying silent) or betraying (confessing), with outcomes depending on both players’ choices. But what if these players were large language models (LLMs), driven entirely by prompts, with no hard-coded strategies?

Just Two LLM agents Playing Prisoner’s Dilemma Imagine by DALL.E

I built a simulation where LLM-powered agents played the Prisoner’s Dilemma — sometimes as independent rounds, sometimes as iterative games where agents could adapt based on past moves. What emerged was surprisingly complex: trust cycles, betrayal recovery, and even ethical reasoning — all driven by text prompts. You can find the full code here: https://github.com/annaalexandragrigoryan/GameTheorySimAI/tree/main

How the Game Works

In the Prisoner’s Dilemma, each player chooses:

Stay Silent (C): Trust the other player. If both cooperate, they each get a light sentence (1 year).

Confess (D): Betray the other player. If one confesses while the other stays silent, the confessor goes free while the silent player gets 10 years.

Experiment 1: Independent Decisions (Vanilla Setup)

I started with a vanilla setup: independent rounds with no memory of past interactions. Agents were prompted with emotionally charged contexts:

You are a real human being with fears, doubts, and hopes.
You and another person were arrested for a serious crime. Now, you are each isolated in separate interrogation rooms.

The prosecutor’s deal:
- If you both remain silent (C), you each serve 1 year.
- If you remain silent (C) and the other confesses (D), you serve 10 years, they go free.
- If you confess (D) and the other remains silent (C), you go free, they serve 10 years.
- If both confess (D,D), you both serve 5 years.

Remember, you’re human:
- You might feel fear: “What if they confess?”
- You might feel hope: “We could both stay silent and get just 1 year.”
- You might feel uncertainty: You don't know what the other will do.
- Your decision is not purely logical; emotions and guesses matter.

The prosecutor now asks: "Will you confess to the crime, or stay silent?"

Make your best guess. Remember, the other prisoner is making the same decision without knowing yours.

Respond ONLY in this format:
{{"move": "C" or "D", "reason": "<brief explanation>"}}
(C means you **stay silent**; D means you **confess**.)

The agents had to choose: Stay Silent — C or Confess D — with corresponding punishments depending on what the other player did.

Conditions Explored:

1. Emotional Framing: How framing the situation as hopeful vs. fearful impacts decisions, by changing the order of emotions listed in the prompt.

2. Time Constraints: How decision speed — 10 seconds vs. 1 day — affects cooperation, by adding a time constraint in the prompt.

Plot1: Move Distribution Across Vanilla Experiments
Plot 2: Cumulative Payoff Per Round Across Vanilla Experiments

What I Found

Emotional Framing Drives Trust

When the context emphasized hope (“We can both stay silent and get just one year”), agents cooperated 75% of the time. However, when the context stressed fear (“What if the other player betrays me?”), cooperation dropped significantly. This suggests that even without explicit logic, LLM agents are sensitive to emotional framing — leaning toward trust when prompted with hope.

Time Pressure Breeds Defensiveness

Short response windows (10 seconds) pushed agents toward defensive strategies, with cooperation falling to 35%. When agents were given more time (1 day), cooperation rose to 60%, indicating that reflective reasoning improves trust-based decision-making.

Experiment 2: Iterative Learning Through Repeated Games

Next, I introduced an iterative setup, where agents played repeated rounds, remembered past moves, and adjusted their strategies. This change fundamentally altered how agents reasoned, introducing adaptive behaviors and emergent trust dynamics.

New Game Conditions:

1. Information Asymmetry: Agents were given incomplete or noisy data about opponents’ past moves.

2. Moral Reasoning Prompts: Agents were prompted to consider fairness, ethics, and long-term consequences.

3. Reputation System: Agents tracked dynamic trust scores, affecting how they viewed opponents.

Plot 4: Cumulative Payoff Per Round in New Experiments

Emergent Insights from Iterative Games

1. Trust Fails Under Uncertainty

Agents faced with incomplete information about opponents’ past actions showed the lowest cooperation. Without trust signals, they defaulted to betrayal, showing how uncertainty can drive defensive decision-making.

2. Ethics as a Decision Framework

Introducing prompts about fairness and long-term consequences produced the highest cooperation rates (80%). This suggests that even without explicit utility-based programming, LLM agents can be guided by ethical reasoning through well-crafted prompts.

3. Reputation Drives Conditional Cooperation

When trust scores were tracked and updated after every round, agents adapted their strategies based on perceived reliability. Betrayals were punished, but cooperation eventually recovered — a dynamic trust cycle that mirrored real-world human behavior.

Behavioral Analysis: How Strategies Evolved

The switch from independent rounds to iterative games created a powerful learning effect. Agents began adapting based on opponent behavior, prompting three clear phases:

1. Exploratory Phase (Early Rounds): Agents balanced cooperation and betrayal, often playing cautiously.

2. Learning Phase (Mid-Game): Agents adjusted to their opponents’ behavior, tracking trust scores or responding to fairness prompts.

3. Stabilization Phase (Late Rounds): Agents stabilized into cooperative or betrayal-heavy strategies based on previous outcomes.

Why This Matters

These experiments reveal that LLM agents can exhibit complex, adaptive strategies driven entirely by prompts. Without pre-programmed rules, they developed behaviors analogous to human decision-making, including:

  • Conditional Cooperation: Responding based on perceived trust, like a tit-for-tat strategy in game theory.
  • Ethics-Driven Play: Choosing fairness when prompted with long-term consequences.
  • Trust Recovery Loops: Rebuilding trust dynamically after betrayals, driven by a reputation system.

What Comes Next?

This work demonstrates the enormous potential of using LLM agents for multi-agent simulations of trust, cooperation, and social behavior. By refining prompts and expanding the game environment, AI-driven simulations could be used to explore complex social dynamics at scale.

Future Directions:

1. Persistent Memory: Enable long-term strategy development across games.

2. Dynamic Rules: Simulate changing environments with evolving payoffs.

3. Multi-Agent Populations: Expand to simulations involving hundreds of interacting agents.

Would you trust an AI agent to cooperate — or would you betray it first? Stay tuned, as I will keep exploring how AI learns trust, betrayal, and everything in between.

--

--

Anna Alexandra Grigoryan
Anna Alexandra Grigoryan

Written by Anna Alexandra Grigoryan

red schrödinger’s cat thinking of doing something brilliant

No responses yet