Search code examples
sqlpostgresqlamazon-web-serviceshipaapii

De-Identifying PHI For HIPAA


I have a SQL DB which contains PHI, hosted on AWS. I want to access this data to perform analytics, however, I must de-identify the data first to comply with HIPAA.

How should I approach this? I have thought of a few approaches:

  1. Simply de-identify the DB with SQL commands.
  2. From now on, every time the DB is added to, add a de-identified version of that data to another DB. Then access this DB for analytics.
  3. From now on, every time the DB is added to, add a de-identified version of that data to another table in that DB. Then access this table with SQL commands for analytics.

Which is the best approach to use to maintain compliance with HIPAA? Or, is there a better way?

Thanks!


Solution

  • Budget allowing, consider doing your analytics on a different system and during the ETL, de-identify the data. Changing the source system to accommodate this requirement will increase complexity to maintain and likely affect other integrations - might end up with a monolith.

    There's various ways to do this: You could do a AWS DMS (with ongoing replication) with the DB as your source and S3 as target (parquet format). From there you could use Athena for analytics as jarmod highlighted, which also supports parquet format and you can even use SQL-like queries in Athena to analyze your data. There's also Redshift, send to another Relational DB, other analytics platforms etc.