Chapter 3: Introduction to Technological Aspects of Privacy
Reidentification Risk and Differential Privacy
Computer scientists have repeatedly re-identified supposedly anonymized data. Differential privacy is a mathematical definition of privacy that adds calibrated noise so an analysis yields essentially the same inference whether or not any individual's data is included; used in parts of the 2020 Census.
An early Latanya Sweeney study uniquely identified the Massachusetts governor from voter files containing only gender, ZIP, and date of birth, spawning an academic field on reidentification. New professionals should assume some supposedly deidentified data may be re-identifiable.
🔑 What differential privacy guarantees
Differential privacy guarantees that anyone seeing the result will make essentially the same inference about any individual whether or not that individual's data was in the input. It defines the statistical noise needed for a set of queries. The U.S. Census used it for certain 2020 data but decided in 2022 it would not yield useful enough results for more complex datasets.
Key terms - quick answers
What is “Differential privacy”?
A mathematical definition of privacy guaranteeing that anyone seeing a result will make essentially the same inference about an individual whether or not that person's data is in the input, defined via the noise needed for a set of queries.