Syrian refugees women sitting outside their tents, during the visit of Filippo Grandi, the United Nations High Commissioner for Refugees, UNHCR, to a camp in the town of Saadnayel.

Syrian refugees women sitting outside their tents, during the visit of Filippo Grandi, the United Nations High Commissioner for Refugees, UNHCR, to a camp in the town of Saadnayel. AP Photo/Bilal Hussein

Refugee or Terrorist? IBM Thinks Its Software Has the Answer

A new tool to turn unstructured data into actionable intelligence could change the way law enforcement fights terrorism, and challenge the data-collection debate.

Tools for turning unstructured data into actionable intelligence are getting better, and that could alter the risk-reward calculation at the heart of the data-collection debate.

Take IBM’s i2 Enterprise Insight Analysis, or i2 EIA. IBM purchased i2 EIA back in 2011 and added in some of the company’s patented cognitive computing capabilities, the most famous of which is Watson, the AI that beat Jeopardy champion Ken Jennings.

IBM believes the tool could help governments separate real refugees from imposters, untangle terrorist cells, or even predict bomb attacks.

Last October, as many European countries were straining to make room for Syrian refugees, other nations were shutting doors, saying that ISIS attackers might try to blend into the throngs.

“Our worldwide team, some of the folks in Europe, were getting feedback that there were some concerns that within these asylum-seeking populations that had been starved and dejected, there were fighting-age males coming off of boats that looked awfully healthy. Was that a cause for concern in regard to ISIS and, if so, could this type of solution be helpful?” said Andrew Borene, strategic initiatives executive at IBM.

IBM hoped to show that the i2 EIA could separate the sheep from the wolves: that is, the masses of harmless asylum-seekers from the few who might be connected to jihadism or who were simply lying about their identities.

“Could we look at a quick background of ISIS leadership based off of existing knowledge stores using unstructured data analytics? Could we identify people who were potentially traveling under false identities or passports? Which identities might they be using? If someone was a sleeper that we came across, would they be building a new legend [or alias]?... How would they be getting those passports?” said Borene.

IBM created a hypothetical scenario, bringing together several data sources to match against a fictional list of passport-carrying refugees. Perhaps the most important dataset was a list of names of casualties from the conflict gleaned from open press reports and other sources. Some of the material came from the Dark Web, data related to the black market for passports; IBM says that they anonymized or obscured personally identifiable information in this set. Another data set was made up, but modeled on the kind of metadata currently available to border guards.

The results depended on who was asking what from the data. Borene said the system could provide a score to indicate the likelihood that a hypothetical asylum seeker was who they said they were, and do it fast enough to be useful to a border guard or policeman walking a beat.

Borene was careful to indicate that the hypothetical score was not an absolute indicator of guilt or innocence.

“It’s like a credit score. A credit score is a wonderful piece of data. But it’s a piece of data. For big financial decisions, in addition to looking a score… someone with a high credit score can still be high risk. Someone with a mediocre credit score can be a safer bet,” he said.

At a higher level, back at headquarters, an analyst with a dedicated terminal could use the system to immediately see a wide web of possible connections or intersections about a particular subject, potentially revealing places, institutions, other people or targets to which they might be connected. The scrubbed-down Dark Web data was particularly useful here.

“We could come up with a list with this level of granular detail: Here’s the person; here’s the address they were associated with; here are the countries where they were suggesting they could get artificial documents from,” said Borene.

It’s the sort of capability that could feature in a Hollywood techno-thriller. Consider Person of Interest, the hit television show about a massive government-funded data analysis engine that can, in the hands of a scrappy team of vigilantes, anticipate who will be involved in a crime, but not exactly how. With the right data, the i2 EIA platform takes that fictional capability one step further.

Say you’re an analyst, and one of the data sources that you pull is municipal parking tickets. You find that one of the suspicious persons that you are looking at received several parking tickets in front of a popular concert venue, then that someone else has also gotten tickets at the same place but at different times. Does this indicate a team working in shifts?

That type of small detail might elude a room full of detectives. If you had to manually enter all the information into a single database, the process could take days or weeks. Says IBM, i2 EIA makes it instantaneous.

Let’s now say that the data shows that parking violator A and parking violator B also both follow a popular DJ on Twitter, a musician playing at the venue next month. Now you have not only the identity of a potential accomplice but the date and location of a targeted strike. The system can also alert analysts as new events of interests occur and new connections are made, as new people who follow that DJ also get strange parking tickets, try to buy passports from the same vendor, etc.

Another scenario that Borene’s group ran as part of the demonstration involved a hypothetical bomb at a train station,detonated by an SMS text. Using made-up but realistic SMS and phone metadata for a typical urban area, an analyst opened a map, drew a circle around the area, and—using on the exact moment of detonation — discovered the number of the phone that had sent the text. Searching on that number immediately brought more phone numbers, addresses of potentially connected individuals, social security numbers, all related to the original number.

“From a variety of sources comes a single baseball card showing all the information,”  Borene said.

Software Design Predicts Life … And Controversy

IBM finished building the demonstration in October, just before ISIS-affiliated attackers killed 130 people in Paris and injured hundreds more. One of the attackers was carrying a false passport, according to Agence France-Presse. German interior minister Thomas de Maiziere speculated that the attacker may have carried the passport deliberately to turn public sympathy against refugees.

If that was the plan, it worked. In the U.S., presidential contenders, led by Sen. Ted Cruz, R-Texas, along with governors in more than 30 states, called for a moratorium on the resettlement of Syrian war refugees.

The reactions were seen by as many as extreme. As Defense One has reported, the United Nations High Commissioner for Refugees, or UNHCR, conducts a variety of screening tests to ensure that refugees are who they say they are. The Department of Homeland Security likewise conducts screening on refugees headed into the United States.

But the threat isn’t entirely manufactured, at least, not by political opportunists. Recent reports indicate that ISIS has developed an “industrial” ability to produce false passports. Platforms like i2 EIA could potentially strengthen screening processes and reassure the public about the identities of those seeking to enter the country, while at the same time depriving some ambitious officeseekers of a fearful talking point.

But that doesn’t mean that such capabilities won’t come without risk or controversy.

IBM representatives pointed out that the i2 EIA doesn’t collect intelligence; it just helps ingest and make sense of unstructured data. They aren’t spies or agents or operatives, just engineers.

But the more data i2 EIA gets, the more helpful it becomes. And that could put pressure on lawmakers and others to keep feeding new, and potentially sensitive, data streams to the beast.

Consider the debate over the U.S. government’s collection of the public’s bulk telephony metadata. The NSA’s program that officially expired in November, but now telephone companies must  collect the metadata themselves and give it to the government under certain conditions. Many defenders of the old program maintained that it was an essential tool against terrorism. The National Academy of Sciences publication Bulk Collection of Signals Intelligence: Technical Options enshrines this view clearly: “There is no software technique that will fully substitute for bulk collection where it is relied on to answer queries about the past after new targets become known.”

Yet defenders of the government’s program couldn’t connect the practice of bulk metadata collection to clear public safety outcomes, at least not in a way that many members of Congress, the president, or the public found convincing.

That may not have been a reflection of the data so much as a poverty of tools to analyze it correctly.

Bottom line: the tools are improving quickly, and that could alter calculations as the public and its leaders debate future data-collection policies.