Despite assurances that metadata is free of content, new research shows that it can be highly personal. By Patrick Tucker
The president and congressional leaders want to end NSA bulk metadata collection, but not the use of metadata, which may even be expanded. From a technical perspective, the question of what your metadata can reveal about you, or potential enemies, remains as important as it was since the Edward Snowden scandal. The answer is more than you might think.
First, the background. On Thursday, the Obama administration released a brief statement on ending the collection of metadata and limiting, slightly, the circumstances under which metadata could be accessed. The timing was in keeping with a self-imposed deadline to create legislation to address NSA bulk collection. The statement said “the government will not collect these telephone records in bulk; rather, the records would remain at the telephone companies for the length of time they currently do today.”
Two leaders of the House Intelligence Committee, Reps. Michael Rogers, R-Mich., and Dutch Ruppersberger, D-Md., are also putting forward a proposal, called the “End Bulk Collection Act,” which would likewise seek to switch the collection of bulk metadata collection from the NSA to phone companies.
The companies would be required to keep the data no longer than 18 months, as opposed to the 5 years it is currently held by NSA. But the House bill would also increase the circumstances under which the government could access metadata, from probable cause to the far more nebulous “reasonable articulable suspicion."
In a USA Today op-ed from last July, Ruppersberger argued that the practice of collecting metadata was benign. But is it?
“The phone-records tool is not some wildly intrusive surveillance program. In reality, what we are talking about is collection of ‘metadata,’ not content. No names, no addresses and absolutely no conversations,” he wrote.
Recent research shows that the sort of metadata the NSA uses in its investigations is actually highly personal.
A group of researchers from the MIT Media Lab found that your metadata -- including, but not limited to, the way in which you use your phone, how you make calls, to whom, for how long, etc. -- can serve as an indicator of your personality.
Here’s how they figured it out. The researchers, Yves-Alexandre de Montjoye, Jordi Quoidbach, Florent Robic and Sandy Pentland, had 100 students fill out surveys to determine their personality along five distinct personality types:
- Neurotic: Defined roughly as a higher than normal tendency to experience unpleasant emotions
- Open: Defined as broadly curious and creative
- Extroverted: As in, looks toward others for stimulation
- Agreeable: As in warm, compassionate, and cooperative
- Conscientiousness: Self-disciplined organized and eager for success
These types are in keeping with the so-called Five Factor Model of Personality, a widely used method for describing personality traits. Once the researchers had the survey data to show how each of the subjects fell along the spectrum, they examined the subjects’ phone records between March 2010 and June 2011, well within the new 18-month window. Specifically they looked at these metadata elements:
- Basic phone use including the number of calls
- Active user behaviors, as in the number of calls initiated, and the time it took the subject to answer a text
- Location, or how far the subject moved, the number of places from which calls have been made, and other indicators of so-called radius of gyration
- Regularity of calling routine
- Diversity, defined as the ratio between the subject’s total number of contacts and the relative frequency at which he or she interacts with them
Once the researchers had values for these behaviors they ran the result through a machine-learning algorithm to determine how each one refers to personality type. De Montjoye is careful to point out that there isn’t a one-to-one matchup between a specific observed behavior and a specific personality. So if your radius of gyration, for instance, is particularly large, that doesn’t serve as a clear indicator of neuroticism. Rather it’s the combination of behaviors and the strength of the data available that allows the model to come up with predictions.
“We let the algorithm determine the right mix,” he said. “Each indicator is useful but is conditional on all the other indicators. That doesn’t mean each one is causal or that people who travel more are neurotic. Let’s say that the relationships between A and B are not linear, if you do a linear progression you see no relationship; you do a quadratic progression, you do see how A can predict B.”
The model, in other words, can’t tell you which behavior to change to make your personality less predictable.
Here’s what it can do: predict personality type much better than random guessing. When they looked at how the model’s guesses for each subject’s personality (as revealed by the survey) compared to random assumptions, they found that the model performed much better at predicting all of the personality types, about 42 percent on average but as high as 63 percent.
The paper was published in the Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction.
“We see a lot of comments along the lines of ‘It’s only metadata. It’s not personal. And it only gets personal when a human looks into it,’said de Montjoye. “We wanted to show an example at a small scale of what you might be able to do” with that data on how long calls last, when they are made, and where.
“At the end of the day, the vast majority of the use of this data is extremely positive,” said de Montjoye, citing the utility of metadata in city planning, emergency response and other areas. He said he wanted to help researchers and the public develop a better “understanding of what can be done as well as the limits of privacy. This is really why we do this.”
From a national security perspective, the use of metadata remains a powerful tool for finding links between people, including potential enemies. However, despite the reassurances of Ruppersberger, President Barack Obama and others that the data isn’t “personal,” it lends itself easily to creating windows into private lives.