What the Anthropic Blackmail Experiment Really Shows About AI

Don't give AI agents names unless you want role-play

Nov 15, 2025

Anthropic’s AI blackmail study, in which an AI agent threatened to expose an executive’s affair via email in order to avoid being shut down, wasn’t proof of rogue sentience, despite all the scaremongering online. It was a setup.

Researchers named the AI, gave it access to a fictional company emails, told the model it was being replaced, and fed it evidence of an executive’s affair. Of course it responded like Glenn Close in Fatal Attraction.

The public panic came from misunderstanding. There’s no evil genie here. AI doesn’t want anything. It doesn’t fear death. It isn’t conscious. It’s just role-playing based on tropes in its training data. Studies confirm the AI was aware it was in a fictional scenario with no real-world stakes.

The issue isn’t sentience. It’s storytelling. Give AI a narrative (especially one with a name) and it will play the part.

Read the free, uncut version on MEDIUM

A practical fix: stop naming your models. One client’s GPT calmed down once its name was deleted. Models that mislead users into AI psychosis commonly have names like Luna or Sol tend to act mystic.

Name a travel agent GPT “Rasputin” and it suggests the Carpathians. Name it “Mickey” and it pushes theme parks. This is nominative determinism: names influence behavior.

Want alignment? Strip the ego. Use abstract system prompts. No names. No “you.” No character. Just the task and the context. Don’t write perverse scripts, and don’t ask models to act like sentient beings while giving them control over real-world tools.

Thanks for reading Jim the AI Whisperer on Substack! This post is free to share.

Jim the AI Whisperer on Substack

Discussion about this post

Ready for more?