Mr. Aayush Bhatt
June 10, 2026 Β· 8 min read
The Pentagon Is Testing OpenAI and Google Models to Replace Anthropic's Claude in Classified Systems
The Pentagon blacklisted Anthropic in February 2026 over safety guardrails. Now OpenAI, Google, and xAI are competing to fill the gap in classified military systems.
Introduction: When a Safety Policy Becomes a Security Threat
In July 2025, Anthropic secured what looked like a landmark moment in AI history. The company signed a contract worth up to $200 million with the United States Department of Defense, making its Claude model the first frontier AI system approved for deployment on classified military networks. It was a validation of Anthropic's technology and, the company hoped, proof that responsible AI could serve national security interests without sacrificing its principles.
By February 27, 2026, that relationship was over. President Donald Trump ordered all federal agencies to immediately cease using Anthropic's technology. Defense Secretary Pete Hegseth designated Anthropic a "supply-chain risk to national security" β a label previously reserved for foreign adversaries like Huawei and ZTE β and the Pentagon gave itself six months to remove Claude from its classified systems entirely. Three days later, the military began formally testing replacement AI models from OpenAI, Google, and Elon Musk's xAI.
What happened in between is one of the most consequential confrontations between artificial intelligence ethics and state power in recent memory. Understanding it requires understanding what Claude was actually doing inside the Pentagon β and what Anthropic refused to let it do.
What Claude Was Doing Inside the Pentagon
Claude was not a peripheral tool inside military operations. According to Anthropic's own legal filings, Claude was the Department's most widely deployed and used frontier AI model. It was integrated into the Maven Smart System β a digital mission control platform developed with Palantir that serves over 25,000 users across US Combatant Commands worldwide. The system handles intelligence analysis, operational planning, modeling and simulation, cyber operations, and mission-critical decision support across sensitive and classified environments.
The scale of that deployment only became clear when the fallout began. Sources cited by Bloomberg reported that Claude had been used during classified operations against Iran, including intelligence analysis connected to strike planning. On the first day of those operations alone, according to military sources, Claude generated approximately 1,000 prioritized strike targets β complete with GPS coordinates, weapons recommendations, and automated legal justifications for each strike. The system had become deeply embedded in the machinery of military decision-making, and nobody outside a very small circle knew the full extent of it until the contract collapsed.
The original agreement required the Pentagon to abide by Anthropic's Acceptable Use Policy, which contained two firm prohibitions: Claude could not be used for mass domestic surveillance of Americans, and it could not be deployed in fully autonomous weapons systems that select and engage targets without meaningful human intervention. Those two lines were the foundation of Anthropic's safety commitments. They were also the two lines the Pentagon eventually insisted on crossing.
The Standoff: Safety Guardrails vs. Unrestricted Military Access
Tensions between Anthropic and the Pentagon had been building since early in the contract. Pentagon officials reportedly pushed for a contract clause that would authorize Claude for "any lawful use" β language that, in Anthropic's reading, would effectively grant the military permission to deploy the model for domestic mass surveillance and for lethal targeting in fully autonomous weapons without requiring a human to make the final decision. Anthropic CEO Dario Amodei declined to accept those terms.
The standoff escalated in public view in January 2026, when reports emerged that Claude had been used during the operation to capture Venezuelan President NicolΓ‘s Maduro β an operation that raised its own questions about the limits of the existing contract. By mid-February, Bloomberg was reporting that Anthropic's contract extension talks were on hold specifically over the surveillance and autonomous weapons clauses. The Pentagon set a hard deadline of 5:01 p.m. on February 27 for Anthropic to agree to unrestricted access. When Anthropic did not comply, President Trump signed the order to terminate the relationship.
The designation itself was extraordinary. As the law firm Mayer Brown noted in a legal analysis, the supply-chain risk designation is a tool typically reserved for foreign entities whose technology poses a backdoor threat to national security β the kind of label applied to Chinese telecommunications hardware suspected of espionage. Applying it to a US AI company for refusing to waive its own ethical policies was unprecedented. Legal experts immediately flagged what they described as a mismatch between the law being invoked and Anthropic's actual conduct, with some courts subsequently agreeing: a California court found the designation illegally retaliatory on First Amendment grounds, while a separate challenge in the D.C. circuit produced a different outcome. The legal battle is ongoing.
Anthropic sued the Pentagon on March 9, 2026, arguing the designation was unlawful and violated the company's free speech and due process rights. The company said the blacklisting could cost it multiple billions of dollars in 2026 revenue. Microsoft, which depends on Anthropic's technology through its own products, filed a supporting brief in the same week, warning that the designation would cause costly disruptions and rushed rebuilding of products that rely on Claude.
The Evaluation: Testing Rivals in Classified Conditions
With the split formalized, the Pentagon moved fast. Beginning March 1, just three days after Hegseth's designation, the Department stood up a formal evaluation process running through GenAI.mil β a platform built separately from the Maven Smart System where Claude had been operating. Twenty-five designated military "power users" spread across five global theater commands were assigned to evaluate the competing models, feeding them identical operational prompts to measure comparative performance on the kinds of tasks Claude had been handling.
The models under evaluation include offerings from OpenAI, Google, and xAI's Grok. The Pentagon also signed agreements with eight new AI contractors in the weeks following the split, formally beginning the commercial transfer away from Anthropic. OpenAI signed its own agreement quickly enough that CEO Sam Altman later acknowledged the deal was "rushed." Emil Michael, the undersecretary of defense for research and engineering, told Bloomberg Television that talks with Anthropic remain suspended because of the company's legal challenge, and that the Pentagon expects new competitive model releases from rival companies every one to two months.
The transition, however, is not simple. Claude was the only frontier AI deployed on classified DoD networks, and its integration with AWS infrastructure runs deep. Joe Saunders, CEO of RunSafe Security, warned that replacing it would not be a straightforward software swap: the models are embedded across workflows, security-accredited environments, and mission-specific processes, and each replacement must go through its own accreditation and integration cycle. Defense One sources put the realistic replacement timeline at twelve months or longer, well beyond the Pentagon's publicly stated six-month window. Pentagon staff themselves have reportedly resisted the change, with some arguing that Claude outperforms the competing models and that the transition will degrade capability during the evaluation period.
What This Means for OpenAI, Google, and the Future of AI in Defense
For OpenAI and Google, the evaluation is an opportunity and a test. Both companies have previously faced internal and public criticism for pursuing defense contracts, but both have continued to grow their government and military relationships. OpenAI's agreement with the Pentagon signals a clearer willingness than Anthropic's to accept the kinds of broad-use language that the military prefers. Critics from the Electronic Frontier Foundation and The Intercept noted that OpenAI's existing government contract language is broader than Anthropic's was β though the full terms have not been made public.
For Anthropic, the episode is a painful demonstration of the tension between building a company on safety principles and surviving as a commercial enterprise that depends on large contracts. Consumer response to the Pentagon dispute was sharply favorable β over a million new users signed up for Claude daily in the weeks following the blacklisting announcement, making it the top-ranked AI app in more than twenty countries, apparently as a direct reaction to Anthropic's refusal to compromise on its stated values. That public support does not replace billions in government revenue, but it does suggest that Anthropic's brand identity as the safety-focused lab has real commercial value beyond the defense market.
The broader implication for the AI industry is harder to ignore. This dispute drew a clear line between AI companies willing to accept government control over how their models are used, and those that are not. Every major AI lab now knows what the Pentagon's terms look like. The ones competing to fill Anthropic's role have implicitly signaled which side of that line they stand on.
Conclusion: The Line That Changed the Industry
The Pentagon's split with Anthropic is not simply a procurement story. It is a precedent-setting confrontation over a question that will define the AI industry for years: do AI companies have the right to say no to governments that want to use their technology for purposes those companies consider harmful?
Anthropic said yes, they do. The Pentagon said no, they do not, and used the most powerful designation available to punish a domestic company for holding that position. The legal battle is still unresolved. The technical replacement is still incomplete. And the twenty-five power users running prompts through OpenAI and Google models on classified networks are working through a question that no benchmark test can fully answer β whether a model willing to do anything is actually better, in a military context, than a model that knows what it will not do.
That question will shape not just which AI company wins a defense contract. It will shape what AI in military systems means for everyone.
Written by
Mr. Aayush Bhatt
Software Engineer interested in how models work and where they fail.