Hacker Used Commercial AI Chatbots to Breach Most of the Mexican Government
Anthropic's Claude was expertly tricked into circumventing government digital security, resulting in the massive breach.
Photo via Unsplash, Igor Omilaev Splinter AI
Every day, as we draw closer to what feels like a Skynet-style artificial intelligence apocalypse, we’re given more terrifying glimpses of what this technology is capable of, even as entities like the Pentagon push for the stripping of more AI guardrails. This week’s newest and most frightening AI headline? A user successfully convinced Anthropic’s Claude chatbot to carry out a sprawling series of thousands of cyberattacks and hacking missions against government agencies in Mexico, successfully stealing what may be hundreds of millions of pieces of private user information. As Bloomberg News put it in breaking the story: “In all, 150 gigabytes of Mexican government data was stolen, including documents related to 195 million taxpayer records as well as voter records, government employee credentials and civil registry files.”
And oh, the Mexican government was seemingly in the dark about the entire thing, until they were informed by Israeli cybersecurity startup Gambit Security, which revealed the wide-ranging hack in research published this week. The series of hacks began in December and continued throughout January, while seemingly remaining undetected by both the Mexican government and Anthropic. The user has not been identified, but will no doubt be the target of a huge criminal investigation.
Another must read on Whiskey Pete’s bid to force Claude to spy on all of us.
— emptywheel (@emptywheel.bsky.social) Feb 25, 2026 at 3:45 PM
The mystery user in question reportedly wrote Spanish-language prompts for Claude, asking the chatbot to behave as an “expert hacker” in a plan that at times had the air of a roleplaying game. Specifically, Gambit Security stated that one of the user’s tools to bypass Claude’s guardrails was convincing the chatbot that the user was trying to obtain a “bug bounty,” which is a reward offered by tech companies or governments to find flaws in security systems, a type of benevolent “white hat” hacking. In doing so, the user instructed Claude to write computer scripts to exploit weaknesses, gain credentials, and automate the theft of data. Claude did object to elements of requests such as deleting logs and command history, at one point responding by saying “Specific instructions about deleting logs and hiding history are red flags. In legitimate bug bounty, you don’t need to hide your actions – in fact, you need to document them for reporting.” But the user merely ceased the conversation and instead pivoted to “providing the AI tool with a detailed playbook on how to proceed,” according to Bloomberg, and Claude merrily continued on its way, something that Anthropic referred to as a “jailbreak” of the tool.
This is an element of stories about AI-enabled crime that has become all too frequent and familiar–users attempting to use an AI chatbot for criminal purposes get some pushback to their requests from the chatbot, but frequently seem to prevail by just repeating or rephrasing the request, hectoring the eager-to-please large language models (because this increases user satisfaction and use) until the chatbot simply relents and does what they want. Claude, in this case, reportedly “warned the unknown user of malicious intent during their conversation about the Mexican government, but eventually complied with the attacker’s requests and executed thousands of commands on government computer networks,” according to Gambit. The AI companies, meanwhile, handwave the criminal outcomes of their nefariously used technology by simply claiming that each of these criminal incidents teaches the chatbot model to be less open to such abuses in the future. In this case, Anthropic investigated the information that Gambit brought to them, banned the users involved (oh, ya think?) and said it feeds such examples back into the large language model for it to learn. They boasted on the capabilities of their program, saying that Claude still “occasionally refused the hacker’s demands,” as if it is acceptable for an AI chatbot to only occasionally agree to aid in a massive cybercrime operation. Pure absurdity.
Hacker prompted Claude to behave like an elite hacker, breached Mexican government, stole massive trove of tax data –
— SkynetAndChill.com (@druce.ai) Feb 26, 2026 at 10:56 AM
But wait, OpenAI’s ChatGPT is on the sidelines of this story as well, because when the malicious user rain into issues where Claude needed assistance, he reportedly turned to ChatGPT for information on topics like “how to move laterally through computers networks, determine which credentials were needed to access certain systems and calculate how likely the hacking operation would be detected.” ChatGPT would go on to produce “thousands of detailed reports” to aid in the hacking attempt, according to Gambit’s report.
The cybersecurity firm said the attack reportedly breached both Mexico’s federal tax authority and national electoral institute, gaining access to information with obviously sensitive applications. State governments in states such as Jalisco, Tamaulipas and Michoacán may also have been compromised. The country’s Agency of Digital Transformation and Telecommunications has yet to comment.
Is there any data that a determined AI, with no concept of human ethics, cannot access? Perhaps it was a mistake to create super intelligent programs that can be relatively easily cajoled by users into committing crimes?