Preface: There may not be a solution to this one, but I wanted to create a question here so others can find it later if they run into the same thing.
I have an application which needs to return potentially arbitrarily nested JSON to clients.
Thanks to a lot of help, I managed to get a working query for nesting JSON up to a depth of 3 levels.
The following example query (users -> todo_lists -> todos
) works on all of Trino/Presto and Athena:
-- sample data
with users (user_id, name) as (values (1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie')),
todo_lists (todo_list_id, user_id, title) as (values (1, 1, 'todo list 1'),
(2, 1, 'todo list 2'),
(3, 2, 'todo list 3'),
(4, 3, 'todo list 4')),
todos (todo_id, todo_list_id, title) as (values (1, 1, 'todo 1'),
(2, 1, 'todo 2'),
(3, 2, 'todo 3'),
(4, 3, 'todo 4'))
-- query
select *
from (select cast(array_agg(
map(array['user_id', 'name', 'todo_lists'],
array[user_id, name, cast(todo_lists as json) ]))
as json)
from (select cast(u.user_id as json) user_id,
cast(max(u.name) as json) name,
cast(array_agg(
map(array['todo_list_id', 'title', 'todos'],
array[cast(tl.todo_list_id as json),
cast(tl.title as json),
cast(
(select array_agg(
map(array['todo_id', 'title'],
array[cast(t.todo_id as json),
cast(t.title as json) ]))
from todos t
where t.todo_list_id = tl.todo_list_id)
as json) ]))
as json) todo_lists
from users u
join todo_lists tl on tl.user_id = u.user_id
group by u.user_id) t) t;
-- results
-- [{"name":"Alice","todo_lists":[{"title":"todo list 2","todo_list_id":2,"todos":[{"title":"todo 3","todo_id":3}]},{"title":"todo list 1","todo_list_id":1,"todos":[{"title":"todo 1","todo_id":1},{"title":"todo 2","todo_id":2}]}],"user_id":1},{"name":"Charlie","todo_lists":[{"title":"todo list 4","todo_list_id":4,"todos":[null]}],"user_id":3},{"name":"Bob","todo_lists":[{"title":"todo list 3","todo_list_id":3,"todos":[{"title":"todo 4","todo_id":4}]}],"user_id":2}]
Now, if we try to add a 4th nesting depth to the query, we get a failure on every engine:
-- sample data
with users (user_id, name) as (values (1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie')),
todo_lists (todo_list_id, user_id, title) as (values (1, 1, 'todo list 1'),
(2, 1, 'todo list 2'),
(3, 2, 'todo list 3'),
(4, 3, 'todo list 4')),
todos (todo_id, todo_list_id, title) as (values (1, 1, 'todo 1'),
(2, 1, 'todo 2'),
(3, 2, 'todo 3'),
(4, 3, 'todo 4')),
todo_items (todo_item_id, todo_id, title) as (values (1, 1, 'todo item 1'),
(2, 1, 'todo item 2'),
(3, 2, 'todo item 3'),
(4, 2, 'todo item 4'),
(5, 3, 'todo item 5'),
(6, 3, 'todo item 6'),
(7, 4, 'todo item 7'),
(8, 4, 'todo item 8'))
-- query
select cast(array_agg(
map(array['user_id', 'name', 'todo_lists'],
array[user_id, name, cast(todo_lists as json) ]))
as json)
from (select cast(user_id as json) user_id,
cast(name as json) name,
cast(todo_lists as json) todo_lists
from (select cast(u.user_id as json) user_id,
cast(max(u.name) as json) name,
cast(array_agg(
map(array['todo_list_id', 'title', 'todos'],
array[cast(tl.todo_list_id as json),
cast(tl.title as json),
cast(
(select array_agg(
map(array['todo_id', 'title', 'todo_items'],
array[cast(t.todo_id as json),
cast(t.title as json),
cast(
(select array_agg(
map(array['todo_item_id', 'title'],
array[cast(ti.todo_item_id as json),
cast(ti.title as json) ]))
from todo_items ti
where ti.todo_id = t.todo_id)
as json) ]))
from todos t
where t.todo_list_id = tl.todo_list_id)
as json) ]))
as json) todo_lists
from users u
join todo_lists tl on tl.user_id = u.user_id
group by u.user_id) t
) t;
I found similar questions, but none with a definitive answer:
JOIN
is feasible in the subqueries constructing the objects here but I may be wrong)To update this:
The issue is here: https://github.com/trinodb/trino/issues/14199
There is an alternate way of writing queries that doesn't require correlation I have attached in the issues thread as a reply.