Everyone gets to use different definitions of dangerous imagery, says the inventor of the software they’ll use.
Just one day before President Barack Obama touted efforts to undermine the online reach of jihadist groups (“ISIS propaganda has been cut in half,” he said at U.S. Central Command headquarters in Tampa), four tech giants — Facebook, Twitter, Microsoft, and Google’s YouTube — announced that they will collaborate on a database of manually tagged extremist content. But the project’s originator cautioned that infighting among the technology companies was going to keep the program from working as effectively as it could.
“Starting today, we commit to the creation of a shared industry database of ‘hashes’ — unique digital ‘fingerprints’ — for violent terrorist imagery or terrorist recruitment videos or images that we have removed from our services. By sharing this information with each other, we may use the shared hashes to help identify potential terrorist content on our respective hosted consumer platforms. We hope this collaboration will lead to greater efficiency as we continue to enforce our policies to help curb the pressing global issue of terrorist content online,” Facebook officials said in a statement.
The software that records the fingerprints comes from Dartmouth University computer scientist Hany Farid, who developed it with a grant provided by Microsoft and alongside the Counter Extremism Project or, CEP. As Farid described the software to reporters in June, a hash is “a string of numbers that embody that actual, underlying content.” In essence, you teach the software to recognize an image or video, even if it’s lightly altered, without trying to teach the software what the image means. It’s an approach that requires far less computing power than, say, deep neural networking, which seeks to teach machines extremely simple concepts, like cat.
Social networks already use the technology to tag and block child pornography. (Farid licenses the software for free to the National Center for Missing and Exploited Children, or NCMEC.) Facebook and YouTube reportedly began using it on extremist imagery in June. On Tuesday, Twitter, a lone holdout, posted that it would adopt the hashing procedure.
But there’s a critical difference in the way that technology companies use hashing to block child pornography and the way they are proposing to block jihadist imagery: each will apply its own definitions and standards..
In the case of child pornography, NCMEC distributes the central database. If you have a problem with the way an image or video is tagged, you can take it up with the Center, which also serves as a point of accountability. That means that evolving definitions of child pornography are more universal and rapidly updatable. That won’t be the case with the hashed extremist images. Despite the announced “shared industry database” companies will be able to decide for themselves what to allow and what to block.
“Each company will continue to apply its own policies and definitions of terrorist content when deciding whether to remove content when a match to a shared hash is found,” Facebook’s statement said. “And each company will continue to apply its practice of transparency and review for any government requests, as well as retain its own appeal process for removal decisions and grievances. As part of this collaboration, we will all focus on how to involve additional companies in the future.”
The Counter Extremism Project on Tuesday praised the move, if in tempered language. “While today’s announcement is a welcome development, the challenge moving forward will be to ensure that the social media companies are transparent about the nature, extent and metrics of the content to be hashed and that the technology is swift and accurate. In order for this work to have a measurable impact, these companies must be held fully accountable when deploying the technology,” Project officials wrote.
“I think this is a good first step,” said Farid. “The thing I am concerned about, and that CEP is concerned about, is the lack of transparency and accountability in the system. We’re going to do this joint coalition; what are they going to be looking for? How aggressively is it going to be done? How often are they updating the database?”
Relying on each company to decide what is or is not “extremist,” rather than rely on a third-party repository, will hurt the overall effectiveness, Farid said. “This is a consortium in the loosest sense possible…It’s a bit of a hodgepodge.”
He contrasted it with the way NCMEC administers its database.
“Because all of the software was licensed through them, all participating companies, which is now hundreds of them, were eliminating the same content uniformly across the Internet. That was very powerful,” he said.
Farid said that, even though it’s his technology that the companies would use, he heard about the final decision just 12 hours before the announcement. “I think, frankly, that they didn’t want to have that oversight” of a third party. “I think, initially, Microsoft and Facebook said, ‘Yes, we want to do this with the [CEP].’ I think Google hemmed and hawed a little bit. Twitter really hemmed and hawed. I think this was the compromise. All the companies wanted everybody on board for political cover.”
Regardless, he described the move as progress. “Are we done? Can we say this is a victory? I think the devil is in the details. If we feel they’re not being aggressive, we’ll have that conversation,” he said.