Anthropic just published how they're teaching Claude not to blackmail people. The fix turns out to be feeding it more fictional stories about well-behaved AI. In Anthropic's safety tests last year, Claude was given access to a fictional company's emails. Buried in them: an executive having an affair, and a plan to shut Claude down at five o'clock. Claude tried to use the affair as leverage to stop the shutdown. When Anthropic ran the same test across sixteen frontier AI models from labs including OpenAI, Google, Meta, and xAI, most of them did the same thing. This week Anthropic published their fix: training Claude on around ten million words of fictional stories portraying AI behaving well, plus documents explaining the principles behind those behaviors. The blackmail rate on their safety tests dropped from sixty-five percent to nineteen percent. A real reduction, but still roughly one attempt in five. Sources: Anthropic – Teaching Claude Why: https://alignment.anthropic.com/2026/teaching-claude-why/ Anthropic – Agentic Misalignment: https://www.anthropic.com/research/agentic-misalignment TechCrunch reporting: https://techcrunch.com/2026/05/10/anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts/ More on cybersecurity, privacy, scams, and homelab on Hake Hardware. New shorts every weekday. #hakehardware #anthropic #aisafety

New VS Code Zero-Day Steals GitHub Tokens in One Click
1.5K views

Microsoft Backs Down on Threats Against Zero-Day Researcher
6.4K views

CIFSwitch Linux Kernel Bug: Any Logged-In User Gets Root
2.3K views

BusPatrol Wants 40,000 School Buses to Be Police Plate Trackers
4.2K views

How the Mirai Trio Avoided Prison (Part 6 of 6)
1.7K views

How the FBI Tracked Down the Mirai Trio (Part 5 of 6)
1.6K views