This Air Force Targeting AI Thought It Had a 90% Success Rate. It Was More Like 25%
Too little of the right kind of data can throw off target algorithms. But try telling the algorithm that.
If the Pentagon is going to rely on algorithms and artificial intelligence, it’s got to solve the problem of “brittle AI.” A top Air Force official recently illustrated just how far there is to go.
In a recent test, an experimental target recognition program performed well when all of the conditions were perfect, but a subtle tweak sent its performance into a dramatic nosedive,
Maj. Gen. Daniel Simpson, assistant deputy chief of staff for intelligence, surveillance, and reconnaissance, said on Monday.
Initially, the AI was fed data from a sensor that looked for a single surface-to-surface missile at an oblique angle, Simpson said. Then it was fed data from another sensor that looked for multiple missiles at a near-vertical angle.
“What a surprise: the algorithm did not perform well. It actually was accurate maybe about 25 percent of the time,” he said.
That’s an example of what’s sometimes called brittle AI, which “occurs when any algorithm cannot generalize or adapt to conditions outside a narrow set of assumptions,” according to a 2020 report by researcher and former Navy aviator Missy Cummings. When the data used to train the algorithm consists of too much of one type of image or sensor data from a unique vantage point, and not enough from other vantages, distances, or conditions, you get brittleness, Cummings said.
In settings like driverless-car experiments, researchers will just collect more data for training. But that can be very difficult in military settings where there might be a whole lot of data of one type—say overhead satellite or drone imagery—but very little of any other type because it wasn’t useful on the battlefield.
The military faces an additional obstacle in trying to train algorithms for some object recognition tasks, compared to, for example, companies training object-recognition algorithms for self-driving cars: It’s easier to get pictures and video of pedestrians and streetlights from multiple angles and under multiple conditions than it is to get pictures of Chinese or Russian surface-to-air missiles.
More and more, researchers have begun to rely on what is called “synthetic training data” which, in the case of military-targeting software, would be pictures or video that have been artificially generated from real data to train the algorithm how to recognize the real thing.
But Simpson said the low accuracy rate of the algorithm wasn’t the most worrying part of the exercise. While the algorithm was only right 25 percent of the time, he said, “It was confident that it was right 90 percent of the time, so it was confidently wrong. And that's not the algorithm's fault. It's because we fed it the wrong training data.”
Simpson said that such results don’t mean the Air Force should stop pursuing AI for object and target detection. But it does serve as a reminder of how vulnerable AI can be to adversarial action in the form of data spoofing. It also shows that AI, like people, can suffer from overconfidence.