Search code examples
jsonpostgresqlmultimaphstore

Postgresql merge rows with same key (hstore or json)


I have a table like this:

+--------+--------------------+   
|   ID   |   Attribute        |  
+--------+--------------------+ 
|    1   |"color" => "red"    |    
+--------+--------------------+  
|    1   |"color" => "green"  | 
+--------+--------------------+ 
|    1   |"shape" => "square" | 
+--------+--------------------+ 
|    2   |"color" => "blue"   | 
+--------+--------------------+ 
|    2   |"color" => "black"  | 
+--------+--------------------+ 
|    2   |"flavor" => "sweat" | 
+--------+--------------------+ 
|    2   |"flavor" => "salty" | 
+--------+--------------------+ 

And I want to run some postgres query that get a result table like this:

+--------+------------------------------------------------------+   
|   ID   |                    Attribute                         |  
+--------+------------------------------------------------------+ 
|    1   |"color" => "red, green", "shape" => "square"          |    
+--------+------------------------------------------------------+  
|    2   |"color" => "blue, black", "flavor" => "sweat, salty"  | 
+--------+------------------------------------------------------+ 

The attribute column can either be hstore or json format. I wrote it in hstore for an example, but if we cannot achieve this in hstore, but in json, I would change the column to json.

I know that hstore does not support one key to multiple values, when I tried some merge method, it only kept one value for each key. But for json, I didn't find anything that supports multiple value merge like this neither. I think this can be done by function merging values for the same key into a string/text and add it back to the key/value pair. But I'm stuck in implementing it.

Note: if implement this in some function, ideally any key such as color, shape should not appear in the function since keys can be expanded dynamically.

Does anyone have any idea about this? Any advice or brainstorm might help. Thank you!


Solution

  • Just a note before anything else: in your desidered output I would use some proper json and not that kind of lookalike. So a correct output according to me would be:

    +--------+----------------------------------------------------------------------+   
    |   ID   |                             Attribute                                |  
    +--------+----------------------------------------------------------------------+ 
    |    1   | '{"color":["red","green"], "flavor":[], "shape":["square"]}'         |    
    +--------+----------------------------------------------------------------------+  
    |    2   | '{"color":["blue","black"], "flavor":["sweat","salty"], "shape":[]}' | 
    +--------+----------------------------------------------------------------------+ 
    

    A PL/pgSQL function which parses the json attributes and executes a dynamic query would do the job, something like that:

    CREATE OR REPLACE FUNCTION merge_rows(PAR_table regclass) RETURNS TABLE (
        id          integer,
        attributes  json
    ) AS $$
    DECLARE
        ARR_attributes  text[];
        VAR_attribute   text;
        ARR_query_parts text[];
    BEGIN
        -- Get JSON attributes names
        EXECUTE format('SELECT array_agg(name ORDER BY name) AS name FROM (SELECT DISTINCT json_object_keys(attribute) AS name FROM %s) AS s', PAR_table) INTO ARR_attributes;
    
        -- Write json_build_object() query part
        FOREACH VAR_attribute IN ARRAY ARR_attributes LOOP
            ARR_query_parts := array_append(ARR_query_parts, format('%L, array_remove(array_agg(l.%s), null)', VAR_attribute, VAR_attribute));
        END LOOP;
    
        -- Return dynamic query
        RETURN QUERY EXECUTE format('
            SELECT t.id, json_build_object(%s) AS attributes 
                FROM %s AS t, 
                LATERAL json_to_record(t.attribute) AS l(%s) 
                GROUP BY t.id;', 
            array_to_string(ARR_query_parts, ', '), PAR_table, array_to_string(ARR_attributes, ' text, ') || ' text');
    END;
    $$ LANGUAGE plpgsql;
    

    I've tested it and it seems to work, it returns a json with. Here is my test code:

    CREATE TABLE mytable (
        id          integer NOT NULL,
        attribute   json    NOT NULL
    
    );
    INSERT INTO mytable (id, attribute) VALUES 
    (1, '{"color":"red"}'),
    (1, '{"color":"green"}'),
    (1, '{"shape":"square"}'),
    (2, '{"color":"blue"}'),
    (2, '{"color" :"black"}'),
    (2, '{"flavor":"sweat"}'),
    (2, '{"flavor":"salty"}');
    
    SELECT * FROM merge_rows('mytable');
    

    Of course you can pass the id and attribute column names as parameters as well and maybe refine the function a bit, this is just to give you an idea.

    EDIT : If you're on 9.4 please consider using jsonb datatype, it's much better and gives you room for improvements. You would just need to change the json_* functions to their jsonb_* equivalents.