AI Can Now Identify People in Anonymized Datasets

1 min read

a black and white photo of a sign that says privacy please

It is no surprise that many companies collect information about users and sell it to interested parties. Though this information must have identifying details removed, artificial intelligence is still able to pick a target out of over 40,000 anonymous users with a success rate of over 50%.

Phone Service Users

Last year, researchers from universities in London and Switzerland published a paper titled Interaction data are identifiable even across long periods of time. In the study, researchers focused on the ways in which humans socialize and how these patterns can be used to identify people from supposedly de-identified datasets.

To answer this question, the researchers trained a neural network to recognize patterns in users’ weekly social interactions. In the first test, researchers trained the neural network with 43,606 mobile phone service subscribers’ interactions over 14 weeks. The data included each interaction’s data, time, duration, type (call or text), the pseudonyms of all those involved, and who initiated the communication.

Then, the interaction data for each user was organized into graphs with nodes representing the user and their contacts, and edges representing the interactions. This graph was provided to the neural network to find graphs in the anonymized dataset that most closely resembled the target graph.

Researchers then determined how successful the neural network was in matching a target to their anonymized data in three cases:

  • One week after the latest records
  • Given information about the interactions of a target’s contacts along with the target’s interactions
  • 20 weeks after the latest records

One week after the latest records

When shown a graph of a target’s interactions that occurred one week after the latest record in the anonymized dataset, the neural network correctly identified the target only 14.7% of the time.

Given information about the interactions of a target’s contacts:

This case was the most successful, with the neural network correctly identifying 52.4% of people. This makes sense as the neural network has more information about each individual.

20 weeks after the latest records:

Given data about the target’s and their contacts’ interactions collected 20 weeks after the latest records in the anonymized dataset, the neural network was still able to correctly identify individuals 24.3% of the time. This suggests that social interaction behavior remains identifiable for long periods of time.

University Students

The researchers were curious about whether their results could be replicated elsewhere and, to test this hypothesis out, they recreated the study using four weeks of close-proximity data from the phones of 587 anonymous university students. 

The data included pseudonyms for the students, encounter time, and strength of the received signal (indicating the proximity to other students). 

In this study, the neural network was able to correctly identify students in the dataset 26.4% of the time.

Check out the full paper.

Ultimately, the study’s findings suggest the need to improve methods to maintain the anonymity of individuals and protect their privacy. Please feel free to comment if you have any questions or if you enjoyed this article!

Aviral Mehrotra Hello, I'm Aviral, and I am a college student studying computer science and business economics. I am passionate about programming, artificial intelligence, the latest technological trends, finance, economics, and personal growth. I have been coding for just about four years now, and I am knowledgable in Python, Java, C++, ReactJS, and Swift. In my free time, I enjoy creating different programming projects that allow me to learn—whether it be creating iOS apps using SwiftUI and Xcode or developing web apps using ReactJS and Python.

Leave a Reply

Your email address will not be published. Required fields are marked *