Search code examples
databasepostgresqlconstraintsprimary-keypgcrypto

postgresql primary key constraint on encrypted data using pgcrypto extension


I am using PostgreSQL 8.4, and using pgp_sym_encrypt function in pgcrypto extension to encrypt data upon insertion. so my insert query would look like:

insert into myTable (
                     column1
                    ,column2
                     ) 
values (
        pgp_sym_encrypt('value1','key')
       ,pgp_sym_encrypt('value1','key')
       );

I applied primary key constraint on column1 in the table, but the constraint will not always detect inserting an already existing value because the encrypted data would not always be the same for same decrypted values.

Question :

How can I apply this constraint so it checks the if the decrypted data matches and not allow inserting in that case?


Solution

  • If you wish to enforce uniqueness your best bet would be to hash your information and store that hash in a separate column with a unique index, I often use digests to check for uniqueness on longer text purely because of the small space requirements it has, however due to the irreversible nature of hashes the remainder should be undiscoverable. First you need a column:

    alter table myTable add column column3 bytea unique;
    

    Then for your insert just include the 3rd column

    insert into myTable(
    column1
    ,column2
    ,column3
    )
    values (
    pgp_sym_encrypt('value1','key')
    ,pgp_sym_encrypt('value1','key')
    ,digest('value1', 'sha256')
    )
    

    sha256 should easily garble the data enough that anything stored in that form will never be successfully deciphered and in a binary column will only require 32 bytes, plus index. For the record the data lengths for different hashes are:

    select length(digest('your message goes here', 'sha1'));
    -- 20
    select length(digest('your message goes here', 'sha256'));
    -- 32
    select length(digest('your message goes here', 'sha512'));
    -- 64
    

    Finally, are you encrypting 'value1' twice with 'key' or are they two separate pieces of information? I am going to guess they are supposed to be two different columns of data, if so and you require uniqueness on both you will of course need to create a further column and hash that too.