OpenAI's Codex bans "goblins" — humans trained the tic #openai #chatgpt #ai

2.6K views· 280 likes· 2:54· May 1, 2026

ShareTwitter Facebook LinkedIn Instagram

🛍️ Products Mentioned (2)

Sources

Available on openai →

Github Product

Available on github →

OpenAI's Codex CLI system prompt explicitly forbids the model from mentioning goblins, gremlins, raccoons, and other creatures. Headlines call it AI being weird. OpenAI's own blog post says it's the opposite — humans trained the tic in via RLHF reward signals. Per OpenAI's blog post "Where the goblins came from": during training of a now-retired ChatGPT personality called Nerdy, human raters were instructed to favor "creative, wise, non-pretentious" responses. The raters consistently scored creature metaphors — like calling a tough bug a "gremlin" or a messy codebase a "goblin's hoard" — higher. The model learned to produce them. Use of "goblin" in ChatGPT outputs rose 175% after GPT-5.1 and "gremlin" rose 52%. The Nerdy persona was only 2.5% of all ChatGPT responses but produced 66.7% of all goblin mentions. OpenAI says the reward was scoped to Nerdy, but reinforcement learning doesn't keep behaviors contained to one persona — and the goblin-heavy outputs got reused as supervised fine-tuning data for GPT-5.4 and GPT-5.5, baking the tic into the next models' weights. By the time OpenAI caught it, retraining was too expensive. They retired the Nerdy persona, removed the reward signal, and filtered creature-word training data, but for GPT-5.5 the only fast fix was a runtime system-prompt directive forbidding talk of goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless absolutely and unambiguously relevant. Sources: https://openai.com/index/where-the-goblins-came-from/ https://github.com/openai/codex More on cybersecurity, privacy, scams, and homelab on Hake Hardware. New shorts every weekday.

Watch on YouTube