How the NSA Can Use Metadata to Predict Your Personality

isak55 via Shutterstock

AA Font size + Print

Despite assurances that metadata is free of content, new research shows that it can be highly personal. By Patrick Tucker

The president and congressional leaders want to end NSA bulk metadata collection, but not the use of metadata, which may even be expanded. From a technical perspective, the question of what your metadata can reveal about you, or potential enemies, remains as important as it was since the Edward Snowden scandal. The answer is more than you might think.

First, the background. On Thursday, the Obama administration released a brief statement on ending the collection of metadata and limiting, slightly, the circumstances under which metadata could be accessed. The timing was in keeping with a self-imposed deadline to create legislation to address NSA bulk collection. The statement said “the government will not collect these telephone records in bulk; rather, the records would remain at the telephone companies for the length of time they currently do today.” 

Two leaders of the House Intelligence Committee, Reps. Michael Rogers, R-Mich., and Dutch Ruppersberger, D-Md., are also putting forward a proposal, called the “End Bulk Collection Act,” which would likewise seek to switch the collection of bulk metadata collection from the NSA to phone companies.

The companies would be required to keep the data no longer than 18 months, as opposed to the 5 years it is currently held by NSA. But the House bill would also increase the circumstances under which the government could access metadata, from probable cause to the far more nebulous “reasonable articulable suspicion.”

In a USA Today op-ed from last July, Ruppersberger argued that the practice of collecting metadata was benign. But is it?

“The phone-records tool is not some wildly intrusive surveillance program. In reality, what we are talking about is collection of ‘metadata,’ not content. No names, no addresses and absolutely no conversations,” he wrote.

(Related: Lawmakers, Obama Want to End NSA’s Bulk Data Collection)

Recent research shows that the sort of metadata the NSA uses in its investigations is actually highly personal.

A group of researchers from the MIT Media Lab found that your metadata — including, but not limited to, the way in which you use your phone, how you make calls, to whom, for how long, etc. — can serve as an indicator of your personality.

Here’s how they figured it out. The researchers, Yves-Alexandre de Montjoye, Jordi Quoidbach, Florent Robic and Sandy Pentland, had 100 students fill out surveys to determine their personality along five distinct personality types:

  • Neurotic: Defined roughly as a higher than normal tendency to experience unpleasant emotions
  • Open: Defined as broadly curious and creative
  • Extroverted: As in, looks toward others for stimulation
  • Agreeable: As in warm, compassionate, and cooperative
  • Conscientiousness: Self-disciplined organized and eager for success

These types are in keeping with the so-called Five Factor Model of Personality, a widely used method for describing personality traits. Once the researchers had the survey data to show how each of the subjects fell along the spectrum, they examined the subjects’ phone records between March 2010 and June 2011, well within the new 18-month window. Specifically they looked at these metadata elements:

  • Basic phone use including the number of calls
  • Active user behaviors, as in the number of calls initiated, and the time it took the subject to answer a text
  • Location, or how far the subject moved, the number of places from which calls have been made, and other indicators of so-called radius of gyration
  • Regularity of calling routine
  • Diversity, defined as the ratio between the subject’s total number of contacts and the relative frequency at which he or she interacts with them

Once the researchers had values for these behaviors they ran the result through a machine-learning algorithm to determine how each one refers to personality type. De Montjoye is careful to point out that there isn’t a one-to-one matchup between a specific observed behavior and a specific personality. So if your radius of gyration, for instance, is particularly large, that doesn’t serve as a clear indicator of neuroticism. Rather it’s the combination of behaviors and the strength of the data available that allows the model to come up with predictions.

“We let the algorithm determine the right mix,” he said. “Each indicator is useful but is conditional on all the other indicators. That doesn’t mean each one is causal or  that people who travel more are neurotic. Let’s say that the relationships between A and B are not linear, if you do a linear progression you see no relationship; you do a quadratic progression, you do see how A can predict B.”

The model, in other words, can’t tell you which behavior to change to make your personality less predictable.

Here’s what it can do: predict personality type much better than random guessing. When they looked at how the model’s guesses for each subject’s personality (as revealed by the survey) compared to random assumptions, they found that the model performed much better at predicting all of the personality types, about 42 percent on average but as high as 63 percent.

The paper was published in the Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction.

“We see a lot of comments along the lines of ‘It’s only metadata. It’s not personal. And it only gets personal when a human looks into it,’said de Montjoye. “We wanted to show an example at a small scale of what you might be able to do” with that data on how long calls last, when they are made, and where.  

“At the end of the day, the vast majority of the use of this data is extremely positive,” said de Montjoye, citing the utility of metadata in city planning, emergency response and other areas. He said he wanted to help researchers and the public develop a better “understanding of what can be done as well as the limits of privacy. This is really why we do this.”

From a national security perspective, the use of metadata remains a powerful tool for finding links between people, including potential enemies. However, despite the reassurances of Ruppersberger, President Barack Obama and others that the data isn’t “personal,” it lends itself easily to creating windows into private lives.

Close [ x ] More from DefenseOne

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • Federal IT Applications: Assessing Government's Core Drivers

    In order to better understand the current state of external and internal-facing agency workplace applications, Government Business Council (GBC) and Riverbed undertook an in-depth research study of federal employees. Overall, survey findings indicate that federal IT applications still face a gamut of challenges with regard to quality, reliability, and performance management.

  • PIV- I And Multifactor Authentication: The Best Defense for Federal Government Contractors

    This white paper explores NIST SP 800-171 and why compliance is critical to federal government contractors, especially those that work with the Department of Defense, as well as how leveraging PIV-I credentialing with multifactor authentication can be used as a defense against cyberattacks

  • GBC Issue Brief: Supply Chain Insecurity

    Federal organizations rely on state-of-the-art IT tools and systems to deliver services efficiently and effectively, and it takes a vast ecosystem of organizations, individuals, information, and resources to successfully deliver these products. This issue brief discusses the current threats to the vulnerable supply chain - and how agencies can prevent these threats to produce a more secure IT supply chain process.

  • Data-Centric Security vs. Database-Level Security

    Database-level encryption had its origins in the 1990s and early 2000s in response to very basic risks which largely revolved around the theft of servers, backup tapes and other physical-layer assets. As noted in Verizon’s 2014, Data Breach Investigations Report (DBIR)1, threats today are far more advanced and dangerous.

  • Information Operations: Retaking the High Ground

    Today's threats are fluent in rapidly evolving areas of the Internet, especially social media. Learn how military organizations can secure an advantage in this developing arena.


When you download a report, your information may be shared with the underwriters of that document.