In Postgres I am taking each row of the table and adding a new column that is an md5 hash of that rows contents.
alter table my_table add hash_col text;
update my_table set hash_col =md5(cast((my_table.*) as text));
My only issue is how would you cast all columns except 1 or 2? Right now the cast function is:
cast(my_table.*)
Psuedo-code for my desired outcome would be:
cast(my_table.* EXCEPT col1, col2)
Unfortunately, SQL has no way of saying "all columns of the table except these ones". It can just say "all columns of the table" with SELECT *
.
You could do something similar with jsonb
:
SELECT to_jsonb(my_table) - '{col1, col2}' FROM my_table;
So this works:
UPDATE my_table t
SET hash_col = md5((to_jsonb(t.*) - '{col1, col2}'::text[])::text);
Related:
But that's considerably more expensive than spelling out columns to be included. And the hash value, while still distinct per given set of values, is based on the (longer) text representation of the jsonb
value. And if you change the table definition of my_table
later, this is likely to break existing hash values. So, it's typically also less safe than spelling out columns to be included.
Spell out remaining columns. Make repeating the (possibly lengthy?) list of columns simpler and safer by encapsulating the hash generation in a function:
CREATE OR REPLACE FUNCTION public.f_myhash(col3 int, col4 int, col5 text)
RETURNS uuid
LANGUAGE sql IMMUTABLE COST 25 PARALLEL SAFE AS
'SELECT md5(textin(record_out(($1,$2,$3))))::uuid';
Adapt column names and types to your actual list.
Then:
ALTER TABLE my_table ADD hash_col uuid;
UPDATE my_table SET hash_col = public.f_myhash(col3, col4, col5);
While being at it, I converted the MD5 hash to type uuid
, which is typically superior in every way. See:
If you don't actually need MD5, consider hash_record_extended()
instead. Simpler and faster. But it generates "only" a bigint
hash.
And maybe an expression index is good enough and you don't even need to add another column?
See (!):