IBM's "Watson" computer at its offices in Yorktown Heights, N.Y.

IBM's "Watson" computer at its offices in Yorktown Heights, N.Y. IBM, Bob Goldberg/AP

Teaching Machines How to Spell Will Help Catch Terrorists

It’s time anti-terrorism technology move beyond finding ways to replace humans and start finding ways to work with us. By David Murgatroyd

For all of our post-9/11 security , here’s a chilling fact: Boston Marathon bomber Tamerlan Tsarnaev might have been caught before his attack if the keepers of one terrorism watch list had done just one thing differently: spelled his name right.

A recent report from the House Homeland Security Committee exposed an embarrassing missed opportunity. Although Tsarnaev was on a watch list, he was not detained on his return to the United States from Dagestan because the list used a different spelling of his name: “Tsarnayev”. A human would have caught the error. But the growing number of requests like this that security professionals face means that we have to rely more and more on software. More and more, machines are the first guard on watch.

Obviously we need technology to be more robust to catch simple spelling variations in suspects’ names -- but how? Teaching software to understand the nuances of names and speech is an enormous national security challenge. If computer programs blindly start hunting misspellings without nuance, your name may be just a typo or two away from one of the hundreds of thousands of suspects on government watch lists. False positives erode public faith in security procedures, inconvenience innocent passengers and make it harder for security professionals to spot real bad actors.

Unfortunately, because the system Tsarnaev slipped through is classified, we can’t prescribe how to fix it. But we can learn from Jeopardy! When IBM’s computer “Watson” famously won the TV game show, it had access to 200 million pages of content via 90 powerful servers. But Watson made its own embarrassing mistake on a question in the category of “U.S. Cities.” The answer was: “Its largest airport is named for a World War II hero; its second largest, for a World War II battle.” Watson responded with the question, “What is Toronto?????”, which isn’t just the wrong answer, it’s in the wrong category! If only it had known Toronto is not a U.S. city it could have given the next answer down its list, the right one Chicago.

Why did this prove to be such a challenge? IBM explained in a blog post: “This is just one of those situations that’s a snap for a reasonably knowledgeable human but a true brain teaser for the machine.” First, there’s the variety and volume of data that Watson ingested. To Watson, “Toronto” is ambiguous. It had read about many places around the world called Toronto, including several towns in the Midwest. Second, modern artificial intelligence systems “think” about this data often by using fuzzy connections among strings of text. This could cause Watson to conclude that, because Toronto’s professional baseball team is in the American League, then Toronto itself is in the U.S.

When we move beyond game shows to stopping terrorists, curious errors can have fatal consequences. Such pressing challenges have grown text analytics technology into a $1 billion market with 25 percent additional growth predicted in 2015. While commercial organizations deploy text analytics to improve search results or provide a better shopping experience, government leaders seek to enhance military and intelligence operations and improve public-safety. But to truly measure the return on this investment requires not just measuring how systems help users, but also determining how users help systems – and thereby crossing technology’s next frontier.

Watson represents a pinnacle of the era of automating human tasks – the culmination of over a half-century of technology advancements. But as the Toronto example makes clear, Watson needs a partner. Assisting humans in their tasks is the theme of the next half-century. Just imagine if Watson collaborated with Jeopardy! champion Ken Jennings, or just with you or me. We could establish a virtuous cycle where our correction of simple errors would result in subtle improvements down the line.

This sort of collaboration is available now. If an agent at JFK airport the day Tamerlan Tsaernaev came through had had a newer system that handled more spelling variation, she would have seen the “Tsaernayev” match but also some innocent results. She would have eliminated these mismatches by diving deeper into the person’s itinerary or history on the watch list. Every time she interacts with the data – honing in on people of interest and discarding the misfires – the computer understands this and presents the next batch of results with greater awareness of what is being sought. All of this is conducted in a very natural way, much like two people working together, side-by-side.

As the collaboration between the person and the computer becomes richer, each will want the ability to communicate their confidence about certain judgments to the other. You can see a hint of this in the question marks in Watson’s “What is Toronto?????”. It’s almost as if it was asking for help. Those question marks would be a strong clue to a human collaborator to focus on that answer and its alternatives.

Confidence can also avoid problems when the system’s default settings filter out either too much or too little. When U.S. officials are planning for a future event, like a visiting head of state, and need to assess the security risk, they want the deepest possible information on potential threats and bad actors. In this case, because they have the time, analysts will want to investigate even items of low confidence that might point to a security threat. However, when responding to a time sensitive event, such as after the Boston bombing, officials only have time to pay attention to the highest quality results so they can filter out results with no less than 90 percent confidence. The remaining results of less than 90 percent confidence will be available later, when they can conduct deeper inquiry and analysis.

Even with a collaborative interface between the human and computer that allows the expression of confidence, if the computer is stuck thinking about a string of texts then the user may have a hard time feeding it useful information. Telling Watson that “Toronto” is a wrong answer to the question above will only help if it has some way of internalizing that the most popular real-world city by that name is not in the U.S.

Historically, national security operatives and analysts have attempted to overcome computers’ focus on strings of text by tediously curating complex search queries, creating long queries of keywords, joined by ANDs, ORs and NOTs to try to capture every possible string of relevant text. Unfortunately this method fails at finding new variations, misspellings or terms. Instead, analysts should be able to issue queries with respect to known key players – persons, places and other things – in the data. The system can filter the data based on real-world things and the analyst can refine the results.

In addition, a system centered on key players instead of keywords can help users track “ghosts” – entities appearing in raw data but that are otherwise unknown. Imagine a hypothetical terrorist incident, where a few hours earlier the computer sees social media messages on networks of interests. By focusing on key players, not keywords, it realizes that although the level of “chatter” is normal, a single person is being frequently mentioned in these messages by a variety of names. Although this person is unknown to the system, it’s able to connect all the names together as referring to him and to flag him as receiving an unusually high amount of attention. An analyst is alerted to this, reviews the messages that the computer has grouped and takes action.

It’s an exciting time to be applying human-computer collaboration to the challenges before us. Systems like Watson may need us as much as we need them.

David Murgatroyd is vice president of engineering at Basis Technology. He has been building human language technology systems since 1998. Twitter: @dmurga

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.