Search code examples
postgresqlgroup-bywindow-functionsrow-number

Conditional ROW NUMBER or COMBINE GROUP BYs


I need to GROUP BY two different columns and combine. Furthermore, IF a size with a different group_id matches another size, then I want to include ALL of those rows for that group_id.

For example, since 55 is in group_id 10 and group_id 20 I want to include all of the results in group_id 20 (IE 55,88,55,88).

p_id  | group_id | size | the_date    
------+----------+-----------+--------
1     |    10    |  55  | 2012-10-24
1     |    10    |  54  | 2014-08-09
1     |    20    |  55  | 2013-05-20
1     |    20    |  88  | 2014-05-20
1     |    20    |  55  | 2015-05-20
1     |    30    |  33  | 2014-05-20
1     |    30    |  55  | 2015-05-20
1     |    30    |  33  | 2015-05-20
1     |    40    |  99  | 2015-05-20

I've looked at some of the other questions on stack AND tried using row_number and first_value but nothing has worked. Below is my desired result.

p_id  | group_id | size |       group_id_arr         |         size_arr            |  date_arr    
------+----------+------+----------------------------+-----------------------------+-----------
1     |    10    |   55 | {10,20,30,20,10,30,30,20}  |  {55,55,33,88,55,55,33,55}  | {2012-10-24,2013-05-20,2014-05-20,2014-05-20,2014-08-09,2015-05-20,2015-05-20,2015-05-20}
1     |    40    |   99 |          {40}              |            {99}             | {2015-05-20}

Here is my DBFiddle where I am trying to achieve the following so I can aggregrate.

 rn   |   p_id   | group_id | size | the_date    
------+----------+----------+------+-----------
1     |    1     |    10    |  55  | 2012-10-24     
2     |    1     |    20    |  55  | 2013-05-20     
3     |    1     |    30    |  33  | 2014-05-20     
4     |    1     |    20    |  88  | 2014-05-20     
5     |    1     |    10    |  54  | 2014-08-09     
6     |    1     |    30    |  55  | 2015-05-20     
7     |    1     |    30    |  33  | 2015-05-20     
8     |    1     |    20    |  55  | 2015-05-20     
1     |    1     |    40    |  99  | 2015-05-20     

Please let me know if you have any questions. I'm open to all approaches, thanks.


Solution

  • I am not quiet sure about the ordering in the final arrays but this should be a detail you can fix:

    demo: db<>fiddle

    WITH sizes as (
        SELECT group_id, the_date, array_agg(size) sizes
        FROM base_table
        GROUP BY group_id, the_date
    )
    SELECT 
        a_group_id, 
        array_agg(b_group_id order by the_date), 
        array_agg(size order by the_date) as sizes, 
        array_agg(the_date order by the_date)
    FROM (
        SELECT 
            a.group_id as a_group_id, 
            b.group_id as b_group_id, 
            unnest(b.sizes) size, b.the_date
        FROM sizes a 
        INNER JOIN sizes b 
        ON a.sizes && b.sizes
    ) s
    GROUP BY a_group_id
    
    1. Aggregate the sizes per group_id
    2. Cross join the aggregates with themself where one size array has at least one member with another. That gives you a table that you expected and decribed.
    3. Grouping by the group_id.