
Following the discharge of GPT-5.5 final week, individuals seen one thing humorous about OpenAI’s newest mannequin. In its Codex coding app, the corporate left a system immediate instructing GPT 5.5 to keep away from point out of goblins, gremlins and different creatures. Sure, you learn that proper. “By no means discuss goblins, gremlins, racoons, trolls, ogres, pigeons, or different animals or creatures except it’s completely and unambiguously related to the person’s question,” the immediate reads.
Apparently, sufficient individuals began speaking about ChatGPT’s creature obsession that OpenAI felt the necessity to present an accounting of the place the goblins got here from. In a weblog put up revealed Wednesday, the corporate explains it started to note a change in ChatGPT following the discharge of GPT-5.1 final November. After one security researcher requested OpenAI to incorporate the phrases “goblin” and “gremlin” in an investigation into the chatbot’s verbal ticks, the corporate discovered ChatGPT’s utilization of “goblin” elevated by 175 p.c after the discharge of GPT-5.1. In the meantime, “gremlin” utilization had risen by 52 p.c over that very same interval.
That is an precise line that was added to the official system immediate for Codex for GPT-5.5 by OpenAI. Often the system immediate is as minimal as potential, so I assume it will in any other case point out goblins loads.
AIs are bizarre.
— Ethan Mollick (@emollick.bsky.social) 2026-04-28T06:14:22.988Z
“A single ‘little goblin’ in a solution could possibly be innocent, even charming. Throughout mannequin generations, although, the behavior grew to become laborious to overlook: the goblins saved multiplying, and we wanted to determine the place they got here from,” OpenAI says. After the discharge of GPT-5.4, the corporate (and a few customers) seen a fair larger uptick in goblin references. At that time, an investigation was in a position to pinpoint what OpenAI describes as “the primary connection to the foundation trigger.”
For some time now, ChatGPT has included a persona characteristic that permits customers to customise the type and tone of the chatbot’s responses. Previous to March of this 12 months, one possibility individuals might choose was “nerdy.” A part of the system immediate for that persona learn as follows: “The world is complicated and unusual, and its strangeness have to be acknowledged, analyzed, and loved. Deal with weighty topics with out falling into the entice of self-seriousness.”
When OpenAI mapped goblin mentions to totally different ChatGPT personalities, it discovered the nerdy persona was disproportionately chargeable for utilizing that one phrase. Regardless of solely accounting for two.5 p.c of all ChatGPT responses, it made 66.7 p.c of all goblin mentions generated by the chatbot. Additional investigation revealed that reinforcement studying was guilty for the uptick in goblin and gremlin utilization. Particularly, OpenAI discovered {that a} single reward mechanism was chargeable for instructing the nerdy persona to persistently favor creature language.
“Throughout all datasets within the audit, the Nerdy persona reward confirmed a transparent tendency to attain outputs to the identical downside with ‘goblin’ or ‘gremlin’ larger than outputs with out, with constructive uplift in 76.2 p.c of datasets,” the corporate explains.
Subsequently, OpenAI discovered, because of how reinforcement studying can work, that the nerdy persona’s love of goblins had transferred to different elements of its fashions. “The rewards had been utilized solely within the Nerdy situation, however reinforcement studying doesn’t assure that realized behaviors keep neatly scoped to the situation that produced them,” the corporate explains. “As soon as a method tic is rewarded, later coaching can unfold or reinforce it elsewhere, particularly if these outputs are reused in supervised fine-tuning or choice information.”
OpenAI started coaching GPT-5.5 earlier than it recognized the reason for ChatGPT’s affinity for goblins, which is why there is a immediate instructing Codex to keep away from creature language. “Codex is, in spite of everything, fairly nerdy,” OpenAI notes. In searching down ChatGPT’s goblins, the corporate notes it has devised new instruments to audit and repair mannequin habits. If it was as much as me, I would not use these instruments. Preserve AI bizarre, I say.


