Methods for De-identification of Protected Health Information

“De-identification” means protecting an individual’s privacy by removing identifying information. For confidentiality, data collected from human subjects may be de-identified. It is possible to de-identify biological data to meet the Health Insurance Portability and Accountability Act (HIPAA) requirements, which sets up and specifies patient privacy legislation.

The need for production data for software development

While developing software for any healthcare entity, the development team requires real-world production data. However, the environments in which this data will be used are on vendor equipment, where clients have no direct oversight or control. Therefore, entities must first remove all protected health information (PHI) from any data utilized outside client-controlled systems. In addition, HHS regulations dictate how PHI can be de-identified.

De-identification authorization

An authorized person or group will ask for approval to begin de-identification procedures if data is shared with an off-site development team. A change control places this request in Salesforce, and three separate client contacts must approve it. The de-identification process can commence only when the change control is greenlit.

De-identification process

Following this approval, the production database is backed up and restored in a local QA environment. A script for de-identification then runs. First, it builds each table required to transition to a development environment. The script then hides all personally identifiable information. First initials replace all patient first and last names, along with extra obscured data. David, for instance, takes on the form Dlhh3#4ad, whereas Anderson adopts the form Aasdf3453@$. Additionally, the script substitutes arbitrary dates for the patient’s birthdates and changes all gender indicators to D.

Random characters substitute real patient address information in the Entity Address Table. These are provider, prescriber, and other address details. In addition, email addresses are changed to anonymous information, and passwords for portal log-ins are changed to random data.

Verification and delivery

A second technological resource will verify the de-identification once the file has been created. The second resource will note the completion and validation of this work in the change control. Data is then securely provided to the development team after successful de-identification.

Final thoughts

The potential of health information technology to support valuable research that incorporates enormous, complex data sets from various sources is being increased by the country’s growing adoption of these technologies. De-identification reduces privacy risks for individuals and allows the secondary use of data for comparative effectiveness studies, policy assessment, life sciences research, and other initiatives by removing identifiers from the health information.

Share this post