--- title: "Privacy fails through data aggregation" date: 2016-10-13T10:56:00+06:00 draft: false tags: ["data privacy","surveillance","three-letter-agencies"] author: "Olivier Falcoz" hidemeta: false ShowReadingTime: true ShowPostNavLinks: true showtoc: false cover: image: "/images/" alt: "" caption: "" --- “Aggregating” or combining data from multiple sources can actually reveal surprisingly specific information. You might not work for the Pentagon, but your data can be aggregated in the same way to [de-anonymize](https://en.wikipedia.org/wiki/De-anonymization) you. Here’s a small collection of these surprising privacy failures: * The Classic Paper – [Simple Demographics Often Identify People Uniquely](http://dataprivacylab.org/projects/identifiability/paper1.pdf) shows that knowing just birth date, gender, and zip code is enough to uniquely identify most people. * Netflix Debacle – An *anonymous* Netflix dataset was [de-anonymized by correlating it with the IMDB](https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf) database. * Social Exposure – [De-anonymizing social networks](https://www.cs.utexas.edu/~shmat/shmat_oak09.pdf) (by Arvind Narayanan) demonstrates how an *anonymous* Twitter graph can be re-identified using Flickr for auxiliary information. * Your Words Betray You – Your choice of words in writing [can be analyzed](http://33bits.org/2012/02/20/is-writing-style-sufficient-to-deanonymize-material-posted-online/) to uniquely identify you according to [On the feasibility of Internet-Scale Author Identification](http://randomwalker.info/publications/author-identification-draft.pdf). * Location, Location, Location – The traces of your GPS location app, even your approximate location, is pretty unique. Outlined in [Unique in the crowd, the privacy bounds of human mobility](http://www.nature.com/articles/srep01376). * Bitcoin is often thought of as an anonymous currency, but it’s [surprisingly non-anonymous](https://coincenter.org/2015/01/anonymous-bitcoin/), considering its reputation. This is because a lot of information is contained in the *public ledger* that records all transactions. See also [An analysis of Anonymity in the Bitcoin System](http://arxiv.org/pdf/1107.4524). Source: [Tozny Blog](https://tozny.com/blog/10-unnerving-privacy-fails-thru-data-aggregation/)