This workshop will provide data privacy/security professionals and legal counsel with an overview of the various regulatory provisions related to de-identification and data minimization in regulations around the globe, as well as an introduction to the principles and methods of statistical disclosure limitation that can be used to meet them. It will delve into the regulatory gaps and operational ambiguities created by the requirements, focusing in particular on de-identification challenges related to unstructured and complex information sources (e.g., genomic data, images, social media posts, free text, audio recordings, etc.) and the increasing availability of publicly accessible data.

Course Objectives
Participants will be able to:

1. recognize the nuances in de-identification definitions in different regulations and the ways in which the use of de-identification can help meet regulatory obligations
2. understand the process of statistical de-identification and how it fits into a formal data management procedure
3. understand the considerations and methodologies for balancing data privacy protections and preserving data utility
4. explain the various approaches to disclosure risk analysis and the differences between them

The course will cover three topics: regulatory trends in the promotion of data minimization and de-identification, including GDPR, HIPAA, and CCPA; statistical methodologies for disclosure risk analysis and privacy protection; and the operational impact of publicly available datasets and non-traditional data types on data management programs.

In addition to discussing the ways in which data minimization requirements overlap and differ, participants will learn the basics of statistical disclosure risk analysis. Drawing on the examples of methodologies employed for healthcare microdata, the basics will include data intrusion scenarios, the importance of both sample and population uniqueness, record linkage methods, formulations of re-identification risks, k-anonymity and other de-identification approaches, the definition of quasi-identifiers and the significance of their classification.

Upon completion of the course, participants will be able to begin developing successfully and statistically adequate de-identification programs for their data sets, thereby permitting them to comply with the growing number of information regulations while still preserving the analytic utility of their data.

Dr. Daniel Barth-Jones, Mailman School of Public Health at Columbia University, Assistant Professor Epidemiology
Patsy Bailin, Head of Privacy & Data Ethics, Datavant
Mike Hintze, Partner, Hintze Law

Room 302

Readings:

Patsy Bailin

Head of Privacy & Data Ethics
Datavant

Mike Hintze

Partner
Hintze Law

Daniel Barth Jones
Daniel Barth-Jones

Professor
Mailman School of Public Health
Columbia University