Stop signs, New York City. Or are they?

Stop signs, New York City. Or are they? David Goehring / Flickr CC2.0

US Spies Want to Know How to Spot Compromised AI

What if you were training an AI, and an adversary slipped a few altered images into its study set?

The US government’s research arm for intelligence organizations, IARPA, is looking for ideas on how to detect “Trojan” attacks on artificial intelligence, according to government procurement documents.

Here’s the problem the agency wants to solve: At a simple level, modern image-recognition AI learns from analyzing many images of an object. If you want to train an algorithm to detect pictures of a road signs, you have to supply it with pictures of different signs from all different angles. The algorithm learns the relationships between the pixels of the images, and how the structures and patterns of stop signs differ from those of speed-limit signs.

But suppose that, during the AI-training phase, an adversary slipped a few extra images (Trojan horses) into your speed-limit-sign detector, ones showing stop signs with sticky notes on them. Now, if the adversary wants to trick your AI in the real world into thinking a stop sign is a speed-limit sign, it just has to put a sticky note on it. Imagine this in the world of autonomous cars; it could be a nightmare scenario.

The kinds of tools that IARPA (Intelligence Advanced Research Projects Activity) wants would be able to detect issues or anomalies after the algorithm has been trained to recognize different objects in images.

This isn’t the only kind of attack on AI that’s possible. Security researchers have also warned about inherent flaws in the way artificial intelligence perceives the world, making it possible to alter physical objects like stop signs to make AI algorithms miscategorize them without ever messing with how it was trained, called “adversarial examples.”

While neither Trojan attacks nor the adversarial examples are known to have been used by malicious parties in the real world, researchers have said they’re increasingly possible. IARPA is looking at a short timeline as well, expecting the program to conclude after a maximum of two years.