LAS VEGAS, Nev. — Mayhem ruled the day when seven AIs clashed here last week — a bot named Mayhem that, along with its competitors, proved that machines can now quickly find many types of security vulnerabilities hiding in vast amounts of code.
Sponsored by the Defense Advanced Research Projects Agency, or DARPA, the first-of-its-kind contest sought to explore how artificial intelligence and automation might help find security and design flaws that bad actors use to penetrate computer networks and steal data.
Mayhem, built by the For All Secure team out of Carnegie Mellon University, so outclassed its competition that it won even though it was inoperable for about half of the contest’s 96 270-second rounds. Mayhem pivoted between two autonomous methods of finding bugs and developing ways to exploit them.
Under one method, dubbed symbolic execution, Mayhem tries to figure out how a target program works by systematically replacing sample inputs with classes of inputs. As James King writes in this seminal 1976 paper on the idea: “Each symbolic execution result may be equivalent to a large number of normal test cases. These results can be checked against the programmer’s expectations for correctness either formally or informally.”
But symbolic execution has serious drawbacks when you attempt to use it against complex code. If you are evaluating what a bit of code might do then you have to consider more than one possibility. This leads to something called path explosion. Considering all those paths exhausts memory.
So the Carnegie Mellon team made their symbolic execution engine smarter than average by helping it prioritize which paths to explore first. “For example, we have found that if a programmer makes a mistake—not necessarily exploitable—along a path, then it makes sense to prioritize further exploration of the path since it is more likely to eventually lead to an exploitable condition,” Carnegie Mellon professor David Brumley and co-authors wrote in a paper that lays out the basics of their approach.
But the bot also uses a second technique, called guided fuzzing, sort of the Oscar Madison to symbolic execution’s Felix Unger. Where symbolic execution is neat and cerebral, fuzzing is messy. For All Secure’s engine, dubbed Murphy, throws random or invalid data at the target code, and watches to see whether it crashes, slows down, or exhibits other behavior that suggests a flaw to exploit.
“These two components communicate through a … database by sharing testcases they find ‘interesting’, based on the coverage they achieve.” Brumley wrotes on the team’s blog. “By using Murphy and Mayhem together, we are able to boost both: the fuzzer is great at quickly finding shallow bugs, but fails on complex cases; Mayhem is good at generating deep paths in a program, but is not always fast enough to explore them all.”
Mayhem wasn’t the only machine that made history at DARPA’s contest, dubbed the Cyber Grand Challenge. Team Shellphish from the University of California at Santa Barbara won a key consolation prize by detecting the so-called Crackaddr bug, a task long thought impossible for a machine reasoning system. Like Mayhem, Shellphish combined constrained symbolic execution and fuzzing, but it used the former to tell the latter what areas to attack.
“If I make a version of it this long,” said Mike Walker, DARPA program manager holding his hands about a foot apart to represent a highly abbreviated version of Crackaddr, “I still can’t find a machine that can figure it out. There are papers in 2015 that are like, ‘We still can’t figure this out.’ A machine solved it live in its compiled form in front of a live audience. To the vulnerability community, that was a pretty big deal.”
It will be some time before robots can tackle the most difficult bugs that are out there, but AI promises to remove the easy ones and drastically improve the ability of humans to find the more difficult ones. “It’s not hard for computers to analyze a huge number of programs; we just parallelize. But it’s hard for humans; they are expensive to parallelize and scale,” Brumely said. “It’s hard for computers to analyze a large program in depth … Humans seem better right now at getting ‘deeper’ into large programs.”
But there is hope that AIs will be able to solve less complicated vulnerability problems much faster, and help the human bug hunters who have little chance of keeping up with the enormous volume of code that will permeate the globe in the decades ahead. “People are always asking me the replacement question,” meaning: When will an AI replace a human in a given role? he says. “Do you know which person is monitoring a terabyte link?” asked Walker, referring to a data link the size of a million, million bytes. “Nobody. How could we ever look at terabytes?”
Instead, organizations from Apple to the Pentagon are increasingly crowdsourcing their bug hunts. But the military can’t allow the public access to weapons programs such as the F-35 Joint Strike Fighter, which relies on millions of lines of code.
Automated bug hunting in 2016 is in somewhat the same situation as mixed martial arts a quarter-century ago. The sport’s earliest days were arguably its most interesting; each competition promised a never-before-seen clash of styles, approaches, and schools of thought. In the first Ultimate Fighting Championship, held in Denver in 1993, fighters trained to deliver and absorb wild aerial kicks found themselves stunned by Royce Gracie, who used a form of Brazilian jiu-jitsu to pull them to the mat and defeat them on the ground.
A match between two opponents where one had trained in Gracie’s techniques and the other had not was incredibly fast and brilliant to watch. But as Gracie’s influence spread and became mainstream, fighters employing the method lost the surprise element that gave certain victory. A fight between two fighters trained in jiu-jitsu looks very different. Picture two men locked in an embrace, virtually motionless until one creates just enough room for his opponent to execute a key hold.
This is what the Las Vegas match-up means for the future of information security. We are rapidly leaving the phase of easy takedowns. The AIs will quickly show themselves able to execute all the easy moves faster than any human competitor.
The real match has barely begun.