Search code examples
sqlpostgresqlsql-updatesequelize.jsgreatest-n-per-group

Update column to any non-unknown value of the column in another row with the same ID


I have a PostgreSQL table in which some of the rows are dirty because at point of data entry the gender of the record is "unknown" although it always stays the same indicated by its object_id, which is the unique identifier of the object.

object_id                           gender
511E1AC7128EE2E74349896B55461F27    Unknown
511E1AC7128EE2E74349896B55461F27    Unknown
511E1AC7128EE2E74349896B55461F27    Male
56665EA256AB3757CDFA6C1CB4334C91    Female
56665EA256AB3757CDFA6C1CB4334C91    Female
56665EA256AB3757CDFA6C1CB4334C91    Unknown
56665EA256AB3757CDFA6C1CB4334C91    Unknown
55C3BFDBD327396E912604D6E635D59B    Unknown
55C3BFDBD327396E912604D6E635D59B    Unknown
55C3BFDBD327396E912604D6E635D59B    Female

Therefore, I want to update my table to be like this:

object_id                           gender
511E1AC7128EE2E74349896B55461F27    Male
511E1AC7128EE2E74349896B55461F27    Male
511E1AC7128EE2E74349896B55461F27    Male
56665EA256AB3757CDFA6C1CB4334C91    Female
56665EA256AB3757CDFA6C1CB4334C91    Female
56665EA256AB3757CDFA6C1CB4334C91    Female
56665EA256AB3757CDFA6C1CB4334C91    Female
55C3BFDBD327396E912604D6E635D59B    Female
55C3BFDBD327396E912604D6E635D59B    Female
55C3BFDBD327396E912604D6E635D59B    Female

In which all the "unknown" columns are converted to whatever non-unknown value the object_id has in /some other entry/. Is there a way I can do this in PostgreSQL - or - even better - Sequelize (Javascript ORM for SQL databases?)


Solution

  • The question is incomplete so I am assuming current Postgres version 9.5 and this table definition:

    CREATE TABLE object (
       object_id uuid PRIMARY KEY  -- ideally a UUID
     , gender    text              -- could probably be boolean
     -- rest irrelevant
       );
    

    Then the solution can be:

    UPDATE object o
    SET    gender = sub.gender
    FROM (
       SELECT object_id, min(gender) AS gender
       FROM   object
       GROUP  BY 1
       ) sub
    WHERE  o.object_id = sub.object_id
    AND    o.gender IS DISTINCT FROM sub.gender;
    

    You need the subquery because aggregate or window functions are not allowed in UPDATE directly.

    The aggregate function min() happens to work because the text 'Unknown' sorts after 'Female' and 'Male'. It would also work for boolean (where 'Unknown' would be NULL), and min() and max() ignore NULL values.

    The last WHERE condition is optional, but it's wise to include it to avoid empty updates. Can be simplified to o.gender <> sub.gender if the column is defined NOT NULL;

    Why would it be ideal to use the data type uuid?