In five rounds, an artificially-intelligent agent showed that it could outshoot other AI’s, and a human. So what happens next with AI in air combat?
The never-ending saga of machines outperforming humans has a new chapter. An AI algorithm has again beaten a human fighter pilot in a virtual dogfight. The contest was the finale of the U.S. military’s AlphaDogfight challenge, an effort to “demonstrate the feasibility of developing effective, intelligent autonomous agents capable of defeating adversary aircraft in a dogfight. “
Last August, Defense Advanced Research Project Agency, or DARPA, selected eight teams ranging from large, traditional defense contractors like Lockheed Martin to small groups like Heron Systems to compete in a series of trials in November and January. In the final, on Thursday, Heron Systems emerged as the victor against the seven other teams after two days of old school dogfights, going after each other using nose-aimed guns only. Heron then faced off against a human fighter pilot sitting in a simulator and wearing a virtual reality helmet, and won five rounds to zero.
The other winner in Thursday’s event was deep reinforcement learning, wherein artificial intelligence algorithms get to try out a task in a virtual environment over and over again, sometimes very quickly, until they develop something like understanding. Deep reinforcement played a key role in Heron System’s agent, as well as Lockheed Martin’s, the runner up.
Matt Tarascio, vice president of artificial intelligence, and Lee Ritholtz, director and chief architect of artificial intelligence, from Lockheed Martin told Defense One that trying to get an algorithm to perform well in air combat is very different than teaching software simply “to fly,” or maintain a particular direction, altitude, and speed. Software will begin with a complete lack of understanding about even very basic flight tasks, explained Ritholtz, putting it at a disadvantage against any human, at first. “You don’t have to teach a human [that] it shouldn’t crash into the ground… They have basic instincts that the algorithm doesn’t have,” in terms of training. “That means dying a lot. Hitting the ground, a lot,” said Ritholtz.
Tarascio likened it to “putting a baby in a cockpit.”
Overcoming that ignorance requires teaching the algorithm that there’s a cost to every error but those costs aren’t equal. The reinforcement comes into play when the algorithm, based on simulation after simulation, assigns weights [costs] to each maneuver, and then re-assigns those weights as experiences are updated.
Here, too, the process varies greatly depending on the inputs, including the conscious and unconscious biases of the programmers in terms of how to structure simulations. “Do you write a software rule based on human knowledge to constrain the AI or do you let the AI learn by trial-and-error? That was a big debate internally. When you provide rules of thumb, you limit its performance. They need to learn by trial-and-error,” said Ritholtz.
Ultimately, it’s no contest how quickly an AI can learn — within a defined area of effort — because it can repeat the lesson anew over and over, on multiple machines.
Lockheed, like several other teams, had a fighter pilot advising the effort. They also were able to run training sets on up to 25 DGx1 servers at a time. But what they ultimately produced could run a single GPU chip.
In comparison, after the victory, Ben Bell, the senior machine learning engineer at Heron Systems, said that their agent had been through at least 4 billion simulations and had acquired at least “12 years of experiences.”
It’s not the first time that an AI has bested a human fighter pilot in a contest. A 2016 demonstration showed that an AI-agent dubbed Alpha could beat an experienced human combat flight instructor. But the DARPA simulation on Thursday was arguably more significant as it pitched a variety of AI agents against one another and then against a human in a highly structured framework.
The AIs weren’t allowed to learn from their experiences during the actual trials, which Bell said was “a little bit unfair.” The actual contest did bear that out. By the fifth and final round of the matchup, the anonymous human pilot, call-sign Banger, was able to significantly shift his tactics and last much longer. "The standard things that we do as fighter pilots aren't working," he said. It didn’t matter in the end. He hadn’t learned fast enough and was defeated.
There-in lies a big future choice that the military will have to make. Allowing AI to learn more in actual combat, rather than between missions and thus under direct human supervision, would probably speed up learning and help unmanned fighters compete even better against human pilots or other AIs. But that would take human decision, making out of the process at a critical point. Ritholtz said that the approach he would advocate, right now at least, would be to train the algorithm, deploy it, then “bring the data back, learn off it, train again, redeploy,” rather than have the agent learning in the air.
Timothy Grayson, director of the Strategic Technology Office at DARPA, described the trial as a victory for better human and machine teaming in combat, which was the real point. The contest was part of a broader DARPA effort called Air Combat Evolution, or ACE, which doesn’t necessarily seek to replace pilots with unmanned systems, but does seek to automate a lot of pilot tasks.
“I think what we’re seeing today is the beginning of something I’m going to call human-machine symbiosis… Let’s think about the human sitting in the cockpit, being flown by one of these AI algorithms as truly being one weapon system, where the human is focusing on what the human does best [like higher order strategic thinking] and the AI is doing what the AI does best,” Grayson said.