Privacy Plus+: Is it time to regulate scrubbed data?
Privacy, Technology and Perspective
Is it time to regulate scrubbed data? De-identification or scrubbing is the process of removing identifying information from data, using methods like “(1) removing data; (2) replacing data with pseudonyms; (3) adding statistical ‘noise;’ and (4) aggregation.” See Boris Lubarsky, Re-Identification of “Anonymized” Data, 1 GEO. L. TECH. REV. 202 (April 2017), available at https://georgetownlawtechreview.org/re-identification-of-anonymized-data/GLTR-04-2017/. And what an attractive thought! If you strip away all the clues that could point to someone’s identity, then isn’t their privacy protected per se and there’s no problem? In fact, devoid of identity markers or clues, it’s not even “personal information” anymore, or (practically by definition) “personally identifiable information.” So goes the thought behind much HIPAA analysis, at least. (The HIPAA Privacy Rule permits certain uses and disclosures of Protected Health Information when it is not individually identifiable, §164.502(d) of the Privacy Rule, and sets forth a de-identification standard and implementation specifications, §164.514(a)-(b)).
The problem with scrubbing data is two-fold. First, de-identification is extremely hard to do, and if done wrong, poorly de-identified data may create serious privacy and other risks for the individuals at issue, as well as expose the entity that scrubbed the data to significant legal and reputational harm. Second, the risk of re-identification is real, especially in this era of big data analytics and increasing powerful computing, because datasets that are ostensibly de-identified can increasingly be made identifiable by connecting them with other datasets.
In 2015, Scott Berinato published an interesting article in the Harvard Business Review under the provocative title, “There’s No Such Thing as Anonymous Data,” https://hbr.org/2015/02/theres-no-such-thing-as-anonymous-data, describing the work of MIT scientist Alexandre de Montjoye. de Montjoye showed that just four (4) transactions off anonymized credit card data showing only dates and locations could reveal unique buyers 90% of the time. Berinato also cites de Montjoye as drawn to a then-new model proposed by Paul M. Schwarz and Daniel Solove, suggesting use of a “spectrum” along which re-identification is more or less probable (rather that the binary, “it’s-either-possible-or-it-isn’t” thinking that prevails today).
It seems to us that there is still a “binary” aspect to all this, but of a different kind. Current laws are predicated on a faulty premise—that scrubbing provides a privacy risk cure-all. It doesn’t. In his law review article cited above, Lubarsky gives numerous examples where scrubbing failed and re-identification of scrubbed data compromised privacy. At the same time, there are public benefits associated with the release of certain “de-identified” datasets—for example, those used for scientific research—and such benefits of releasing scrubbed data must be weighed against the privacy risks associated with the potential for re-identification. Accordingly, we think scrubbed data is a proper subject for regulation, which should at least limit its use and distribution and provide data subjects with rights to know what of their data has been scrubbed, by whom, why, and with whom it has been shared.
Hosch & Morris, PLLC is a Dallas-based boutique law firm dedicated to data protection, privacy, the Internet and technology. Open the Future℠.