Drudge Retort: The Other Side of the News
Monday, September 20, 2021

Publishing data of all kinds offers big benefits for government, academic, and business users. Regulators demand that we make that data anonymous to deliver its benefits while protecting personal privacy. But what happens when people read between the lines?



Admin's note: Participants in this discussion must follow the site's moderation policy. Profanity will be filtered. Abusive conduct is not allowed.

More from the article...

...Making data anonymous is known as de-identifying it, but doing it properly is more challenging than it seems, says Wei Wang, professor of computer science and director of the Scalable Analytics Institute at UCLA.

"It's one thing to remove the identity, but we also need to keep in mind that the remaining data right after we remove that entity is still useful," she says.

With a little work, people can often recreate your identity from these remaining data points. This process is called re-identification, and it can ruin lives.

In a recent case, an online newsletter outed a Catholic priest who was a frequent user of the Grindr gay hookup app. The newsletter purchased the Grindr usage data from a third-party data broker. Even though the data set had no identifying information, the newsletter found him using his device ID and location data. The ID showed up in gay bars, his work address, and family addresses, which was enough to find his name and out him. He later resigned.

The spectre of re-identification has grave implications for us all, and should give us pause as we rush to publish anonymous data sets. It has become a sport for some researchers, such as those who mined anonymous AOL search queries in 2006 and identified individuals from de-identified Netflix usage data. Both organisations had published the data in the name of research....

#1 | Posted by LampLighter at 2021-09-20 11:09 AM | Reply

Yes this is some scary stuff. I might add that if you connect enough sources of data together you can rebuild a whole lot of information. It is all out there and it is really cheap.

Note: Sounds like someone had it out for the priest in the article...

#2 | Posted by GalaxiePete at 2021-09-20 06:14 PM | Reply | Newsworthy 1

I work in this field. If data is useful, it is likely re-identifiable, especially if the granularity is at the event level.

I am working on ways to synthesize data for public sharing so researchers can test their queries, then when they are ready, ask us to run the code and release highly aggregated results.

#3 | Posted by bored at 2021-09-21 01:37 AM | Reply

Comments are closed for this entry.

Home | Breaking News | Comments | User Blogs | Stats | Back Page | RSS Feed | RSS Spec | DMCA Compliance | Privacy | Copyright 2021 World Readable

Drudge Retort