U.S. Marine Corps / Cpl. Anthony Ramsey

Meet the startups trying to build military-specific AI

The Anthropic-Pentagon feud revealed a giant gap between what giant frontier models do and what troops actually need.

The battle between AI model builder Anthropic and the Pentagon has exposed a huge gap between what AI tools the military wants and what companies like Anthropic, xAI, and OpenAI actually make: AI tools for use by everyone, not specifically for the military. A handful of veteran-run or -financed startups aim to fill that gap. Their pitch: AI for war should have some basic understanding of war, beyond reading Tom Clancy fan fiction. It shouldn’t confidently offer low-confidence answers just to appease the user. And it should work even when a high-tech adversary severs its connection to the cloud.

The needs gap

Among the uncomfortable truths the fight between Anthropic and the Defense Department reveals is that the Pentagon had deep reservations about the language models themselves, their potential for hallucination, and that they may “not follow instructions.”

But the Pentagon allowed wide deployment of Anthropic’s model anyway, anxious to get at least some generative-AI tools into operators’ hands. It reportedly played a role in Operation Midnight Hammer, the raid that captured Venezuelan President Nicolás Maduro, although Pentagon officials have declined to confirm that.

After the raid, Anthropic officials called Palantir to ask whether their AI models had been used in the operation, Defense Undersecretary for Research and Engineering Emil Michael said on Friday. Michael said that was “a whoa moment for the whole leadership at the Pentagon, that we're potentially so dependent on a software provider without another alternative.” He said it raised several concerns, including that Anthropic might shut down access to models in such situations.

Anthropic itself had similar concerns, according to one company official: the company didn’t think it was safe for the military to rely on their models for combat situations.

Another aspect of the shortcomings of today’s frontier AI models—Anthropic’s Claude, Google’s Gemini, OpenAI’s ChatGPT, and xAI’s Grok—is that they need a connection to the cloud. This makes them unreliable for today’s troops and unusable for tomorrow’s autonomous weapons.

OpenAI tacitly acknowledged this limitation when it recently announced its own deal to deploy on the Pentagon’s classified networks—though it described this inability to deploy large foundational models to the battlefield as a “safeguard” against the kind of unreliability that concerned Anthropic officials. 

“Our contract limits our deployment to cloud API,” OpenAI’s national security lead Katrina Mulligan explained on X. “Autonomous systems require inference at the edge. By limiting our deployment to cloud API, we can ensure that our models cannot be integrated directly into weapons systems, sensors, or other operational hardware.”

The way forward

Even as the Pentagon was noisily ejecting Anthropic from its good graces, the Army was preparing to unveil a new effort to close the gap. Project Aria, announced on Thursday, is intended to help the service develop and deploy new AI models and tools “to tackle real operational problems”—that is, designed specifically to help soldiers do their jobs. 

This is also the object of a new class of AI startups run by people with military experience and dedicated to battlefield tools that don’t need to phone home.

One is Smack Technologies, which on Monday announced that it had secured $32 million in investor funding to build what they call a “frontier lab for national security.”

Andrew Markoff, a former Marine special operator who co-founded Smack, says his AI is trained on combat-relevant datasets, not the unspecialized fodder fed to Claude, Gemini, and other frontier models.

“There is no training set for World War Three, right?” Markoff said in a call with reporters last week. “There's no way to build reinforcement learning…if you don't have deep domain expertise and a deep bench of people with domain expertise. There is no shortcut around encoding good human prior knowledge, and it doesn't exist in doctrinal manuals.”

He called the Venezuela raid a good example of the sort of operation that AI could help scale up in a conflict with a more advanced adversary. 

“Multiply it by 100 and scale. You have targets that you want to strike, you have sensors that you're trying to allocate on those targets to figure out what's going on. And to facilitate the strikes, you have strike platforms and escorts that are coming together from all over the world with very detailed sequencing requirements; you know, task A has to happen X number of seconds before Task B. And all of these are dependent on some, some other thing happening at, you know, time X. So, like, all of these things have to come together globally, a really tight timeline.”

But Markoff said that’s not the sort of thing that commercial large language models are built to do. Models like Claude “have no way to optimize between those goals. And it doesn't have the ability to do the detailed time, space calculations, [to perform] geospatial reasoning grounded in physics, to make the decisions about, literally, which munitions need to be where at what time, talking to which sensors at what time. It doesn't have the ability to do that.”

This was echoed by Jason Rathje, a former Air Force acquisitions officer and co-founder of the Defense Department’s Office of Strategic Capital who now leads the public-sector division at webAI

Frontier models like Claude “are built to answer millions of different kinds of questions for billions of users. Military organizations often need something different, systems tuned for specific operational tasks like logistics planning, equipment maintenance, intelligence analysis, or operational decision support,” Rathje said.

The limitations related to cloud needs are equally important. “Many of today’s frontier models are designed as centralized services for massive commercial user bases, requiring the most advanced chipsets and high-capacity data center infrastructure available, and consuming enormous amounts of power. That makes sense for consumer applications, but military organizations often have very different requirements,” he said. “What defense organizations are asking for is sovereignty: control over the model, the data, and the infrastructure it runs on.”

Smack Technologies is producing two product suites: one to work like the well-known generative AI models, but trained on military intelligence and operator experience; and the other to work in remote battlefields.

Sherman Williams, a Navy veteran and founder of AIN Ventures, has invested in a number of dual-use and defense-focused startups. He acknowledges that no AI startup is going to beat one of the big frontier models in metrics like reasoning benchmarks. But “a model that's 85% as capable but runs on a [denied, disrupted, intermittent, and limited] network at the tactical edge beats GPT-5 in a data center you can't reach.”

Even the data centers you can reach are vulnerable, as shown by Iran's targeting of an AWS facility in Bahrain. “These data centers are important, but they are also vulnerable. Context matters more than benchmarks.”

The new class of DOD-focused AI startups “aren't trying to out-train OpenAI,” he said. “They're building the adaptation and deployment stack that makes open-source models usable in classified settings. Secure fine-tuning, domain-specific models for [intelligence, surveillance, and reconnaissance] and [command and control] edge deployment.”

Williams says he’s seeing “strong pull signals from military customers, especially SOCOM and INDOPACOM,” which has been extensively using AI at a headquarters level for more than a year.

He added that DOD buyers and users want to trust the makers of their AI tools, and that bond is easier forged with founders who are familiar with military operations.

But just hiring a veteran-developed AI doesn’t solve a broader problem of large language models: they speak confidently when they shouldn’t, and they often tailor their responses, or even lie, to their users.

Pete Walker, a retired Navy commander and the chief innovation officer at defense AI and cybersecurity firm IntelliGenesis, said that the big frontier models often provide answers users want to hear. 

“The way these models are built, one of the reasons why they're so big, is they encourage conversation,” and that means encouraging users to dive deeper into areas of interest on specific topics, not talking to them honestly. Walker, who holds a Ph.D. in cognitive science, has peer-reviewed research to back up these assertions.

So his company is working to develop a framework for large language models based on counter-factual thinking—presenting alternative points of view to challenge users' assumptions rather than simply reinforce the assumptions the user brought to the original question. He describes it as getting a model to think, ‘Hey, you're saying that if A then B, but what if it's not A, or what if not B? What does that imply?’ And so I think those are areas of research that we need.”