By Patrick Tucker
March 28, 2014
The president and congressional leaders want to end NSA bulk metadata collection, but not the use of metadata, which may even be expanded. From a technical perspective, the question of what your metadata can reveal about you, or potential enemies, remains as important as it was since the Edward Snowden scandal. The answer is more than you might think.
First, the background. On Thursday, the Obama administration released a brief statement on ending the collection of metadata and limiting, slightly, the circumstances under which metadata could be accessed. The timing was in keeping with a self-imposed deadline to create legislation to address NSA bulk collection. The statement said “the government will not collect these telephone records in bulk; rather, the records would remain at the telephone companies for the length of time they currently do today.”
Two leaders of the House Intelligence Committee, Reps. Michael Rogers, R-Mich., and Dutch Ruppersberger, D-Md., are also putting forward a proposal, called the “End Bulk Collection Act,” which would likewise seek to switch the collection of bulk metadata collection from the NSA to phone companies.
The companies would be required to keep the data no longer than 18 months, as opposed to the 5 years it is currently held by NSA. But the House bill would also increase the circumstances under which the government could access metadata, from probable cause to the far more nebulous “reasonable articulable suspicion."
In a USA Today op-ed from last July, Ruppersberger argued that the practice of collecting metadata was benign. But is it?
“The phone-records tool is not some wildly intrusive surveillance program. In reality, what we are talking about is collection of ‘metadata,’ not content. No names, no addresses and absolutely no conversations,” he wrote.
(Related: Lawmakers, Obama Want to End NSA’s Bulk Data Collection)
Recent research shows that the sort of metadata the NSA uses in its investigations is actually highly personal.
A group of researchers from the MIT Media Lab found that your metadata — including, but not limited to, the way in which you use your phone, how you make calls, to whom, for how long, etc. — can serve as an indicator of your personality.
Here’s how they figured it out. The researchers, Yves-Alexandre de Montjoye, Jordi Quoidbach, Florent Robic and Sandy Pentland, had 100 students fill out surveys to determine their personality along five distinct personality types:
These types are in keeping with the so-called Five Factor Model of Personality, a widely used method for describing personality traits. Once the researchers had the survey data to show how each of the subjects fell along the spectrum, they examined the subjects’ phone records between March 2010 and June 2011, well within the new 18-month window. Specifically they looked at these metadata elements:
Once the researchers had values for these behaviors they ran the result through a machine-learning algorithm to determine how each one refers to personality type. De Montjoye is careful to point out that there isn’t a one-to-one matchup between a specific observed behavior and a specific personality. So if your radius of gyration, for instance, is particularly large, that doesn’t serve as a clear indicator of neuroticism. Rather it’s the combination of behaviors and the strength of the data available that allows the model to come up with predictions.
“We let the algorithm determine the right mix,” he said. “Each indicator is useful but is conditional on all the other indicators. That doesn’t mean each one is causal or that people who travel more are neurotic. Let’s say that the relationships between A and B are not linear, if you do a linear progression you see no relationship; you do a quadratic progression, you do see how A can predict B.”
The model, in other words, can’t tell you which behavior to change to make your personality less predictable.
Here’s what it can do: predict personality type much better than random guessing. When they looked at how the model’s guesses for each subject’s personality (as revealed by the survey) compared to random assumptions, they found that the model performed much better at predicting all of the personality types, about 42 percent on average but as high as 63 percent.
The paper was published in the Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction.
“We see a lot of comments along the lines of ‘It’s only metadata. It’s not personal. And it only gets personal when a human looks into it,’said de Montjoye. “We wanted to show an example at a small scale of what you might be able to do” with that data on how long calls last, when they are made, and where.
“At the end of the day, the vast majority of the use of this data is extremely positive,” said de Montjoye, citing the utility of metadata in city planning, emergency response and other areas. He said he wanted to help researchers and the public develop a better “understanding of what can be done as well as the limits of privacy. This is really why we do this.”
From a national security perspective, the use of metadata remains a powerful tool for finding links between people, including potential enemies. However, despite the reassurances of Ruppersberger, President Barack Obama and others that the data isn’t “personal,” it lends itself easily to creating windows into private lives.
By Patrick Tucker // Patrick Tucker is technology editor for Defense One. He’s also the author of The Naked Future: What Happens in a World That Anticipates Your Every Move? (Current, 2014). Previously, Tucker was deputy editor for The Futurist for nine years. Tucker has written about emerging technology in Slate, The Sun, MIT Technology Review, Wilson Quarterly, The American Legion Magazine, BBC News Magazine, Utne Reader, and elsewhere.
March 28, 2014