There is a really complicated query, which I cannot wrap my head around alone. In the queried table there are 5 columns: PK_ID
, which is the unique identifier of the value, ID
, which connects the aggregated values in one group, the count
, which says, how many value groups should be aggregated (if there aren't enough, the group with such ID
should be ommited from the query results), the num
, which says the number of the aggregated value inside the group, and the value
to be aggregated.
(Table T1)
PK_ID | ID | num | count | value
------+-------+-------+--------+--------
1 | 1 | 1 | 3 | A
------+-------+-------+--------+--------
2 | 1 | 2 | 3 | B
------+-------+-------+--------+--------
3 | 1 | 3 | 3 | C
------+-------+-------+--------+--------
4 | 1 | 1 | 3 | D
------+-------+-------+--------+--------
5 | 1 | 3 | 3 | E
------+-------+-------+--------+--------
6 | 1 | 2 | 3 | F
------+-------+-------+--------+--------
7 | 1 | 3 | 3 | G
------+-------+-------+--------+--------
8 | 2 | 1 | 2 | H
------+-------+-------+--------+--------
9 | 2 | 2 | 2 | I
------+-------+-------+--------+--------
10 | 2 | 1 | 2 | J
------+-------+-------+--------+--------
11 | 3 | 1 | 5 | X
The results of such query would be:
PK_ID | T1_ID | cross_aggr | T1_PK_IDs
------+-------+------------+-----------
1 | 1 | {A, B, C} | {1, 2, 3}
------+-------+------------+-----------
2 | 1 | {A, B, E} | {1, 2, 5}
------+-------+------------+-----------
3 | 1 | {A, B, G} | {1, 2, 7}
------+-------+------------+-----------
4 | 1 | {A, F, C} | {1, 6, 3}
------+-------+------------+-----------
5 | 1 | {A, F, E} | {1, 6, 5}
------+-------+------------+-----------
6 | 1 | {A, F, G} | {1, 6, 7}
------+-------+------------+-----------
7 | 1 | {D, B, C} | {4, 2, 3}
------+-------+------------+-----------
8 | 1 | {D, B, E} | {4, 2, 5}
------+-------+------------+-----------
9 | 1 | {D, B, G} | {4, 2, 7}
------+-------+------------+-----------
10 | 1 | {D, F, C} | {4, 6, 3}
------+-------+------------+-----------
11 | 1 | {D, F, E} | {4, 6, 5}
------+-------+------------+-----------
12 | 1 | {D, F, G} | {4, 6, 7}
------+-------+------------+-----------
13 | 2 | {H, I} | {8, 9}
------+-------+------------+-----------
14 | 2 | {J, I} | {10, 9}
So it returned every possible combination of value
s inside each group, defined by ID
and described by count
, and those values are aggregated in the order of each individual num
.
It looks like I have to use the CROSS JOIN
(in the same manner as the cartesian product would work in linear algebra), but the CROSS JOIN
seems to work only if the number of JOIN
ed tables is known beforehand, but there is the count
complication. The count
itself could potentially (I mean it doesn't but it could) range from 2
to infinity
.
I have thought of having it processed by a WINDOW
function, PARTITION BY
the ID
and by num
, but after that I'm stuck, since there seems no way to have a cartesian product of those partitions.
Should I call it a day and find another way to process the data, or is there a way to compose such a query?
This reads like a graph-walking problem - which requires a recursive query.
with recursive cte as (
select id, num, cnt, 1 as lvl,
array[value::text] as arr_values,
array[pk_id::int] as arr_pk_id
from mytable where num = 1
union all
select c.id, t.num, c.cnt, c.lvl + 1,
c.arr_values || t.value::text,
c.arr_pk_id || t.pk_id
from cte c
inner join mytable t
on t.id = c.id
and t.num = c.num + 1
and t.num <= c.cnt
)
select id, arr_values, arr_pk_id
from cte
where array_length(arr_pk_id, 1) = cnt
order by arr_pk_id
For each id
, the recursive query sarts at num
1
and follows all possible paths. The outer query just filters on the end of paths.
id | arr_values | arr_pk_id -: | :--------- | :-------- 1 | {A,B,C} | {1,2,3} 1 | {A,B,E} | {1,2,5} 1 | {A,B,G} | {1,2,7} 1 | {A,F,C} | {1,6,3} 1 | {A,F,E} | {1,6,5} 1 | {A,F,G} | {1,6,7} 1 | {D,B,C} | {4,2,3} 1 | {D,B,E} | {4,2,5} 1 | {D,B,G} | {4,2,7} 1 | {D,F,C} | {4,6,3} 1 | {D,F,E} | {4,6,5} 1 | {D,F,G} | {4,6,7} 2 | {H,I} | {8,9} 2 | {J,I} | {10,9}