Search code examples
sqlprestotrinoamazon-athena

Trino/Presto/Athena: "Given Correlated Subquery is supported" (Trino v371, Athena v2/Presto v0.217)


Preface: There may not be a solution to this one, but I wanted to create a question here so others can find it later if they run into the same thing.

I have an application which needs to return potentially arbitrarily nested JSON to clients.

Thanks to a lot of help, I managed to get a working query for nesting JSON up to a depth of 3 levels.

The following example query (users -> todo_lists -> todos) works on all of Trino/Presto and Athena:

-- sample data
with users (user_id, name) as (values (1, 'Alice'),
                                  (2, 'Bob'),
                                  (3, 'Charlie')),
 todo_lists (todo_list_id, user_id, title) as (values (1, 1, 'todo list 1'),
                                                      (2, 1, 'todo list 2'),
                                                      (3, 2, 'todo list 3'),
                                                      (4, 3, 'todo list 4')),
 todos (todo_id, todo_list_id, title) as (values (1, 1, 'todo 1'),
                                                 (2, 1, 'todo 2'),
                                                 (3, 2, 'todo 3'),
                                                 (4, 3, 'todo 4'))
-- query
select *
from (select cast(array_agg(
    map(array['user_id', 'name', 'todo_lists'],
        array[user_id, name, cast(todo_lists as json) ]))
             as json)
  from (select cast(u.user_id as json)   user_id,
               cast(max(u.name) as json) name,
               cast(array_agg(
                       map(array['todo_list_id', 'title', 'todos'],
                           array[cast(tl.todo_list_id as json),
                           cast(tl.title as json),
                           cast(
                                   (select array_agg(
                                                   map(array['todo_id', 'title'],
                                                       array[cast(t.todo_id as json),
                                                       cast(t.title as json) ]))
                                    from todos t
                                    where t.todo_list_id = tl.todo_list_id)
                               as json) ]))
                   as json)              todo_lists
        from users u
                 join todo_lists tl on tl.user_id = u.user_id
        group by u.user_id) t) t;

-- results
-- [{"name":"Alice","todo_lists":[{"title":"todo list 2","todo_list_id":2,"todos":[{"title":"todo 3","todo_id":3}]},{"title":"todo list 1","todo_list_id":1,"todos":[{"title":"todo 1","todo_id":1},{"title":"todo 2","todo_id":2}]}],"user_id":1},{"name":"Charlie","todo_lists":[{"title":"todo list 4","todo_list_id":4,"todos":[null]}],"user_id":3},{"name":"Bob","todo_lists":[{"title":"todo list 3","todo_list_id":3,"todos":[{"title":"todo 4","todo_id":4}]}],"user_id":2}]

Now, if we try to add a 4th nesting depth to the query, we get a failure on every engine:

-- sample data
with users (user_id, name) as (values (1, 'Alice'),
                                  (2, 'Bob'),
                                  (3, 'Charlie')),
 todo_lists (todo_list_id, user_id, title) as (values (1, 1, 'todo list 1'),
                                                      (2, 1, 'todo list 2'),
                                                      (3, 2, 'todo list 3'),
                                                      (4, 3, 'todo list 4')),
 todos (todo_id, todo_list_id, title) as (values (1, 1, 'todo 1'),
                                                 (2, 1, 'todo 2'),
                                                 (3, 2, 'todo 3'),
                                                 (4, 3, 'todo 4')),
 todo_items (todo_item_id, todo_id, title) as (values (1, 1, 'todo item 1'),
                                                      (2, 1, 'todo item 2'),
                                                      (3, 2, 'todo item 3'),
                                                      (4, 2, 'todo item 4'),
                                                      (5, 3, 'todo item 5'),
                                                      (6, 3, 'todo item 6'),
                                                      (7, 4, 'todo item 7'),
                                                      (8, 4, 'todo item 8'))
-- query
select cast(array_agg(
    map(array['user_id', 'name', 'todo_lists'],
        array[user_id, name, cast(todo_lists as json) ]))
       as json)
from (select cast(user_id as json)   user_id,
         cast(name as json) name,
         cast(todo_lists as json)   todo_lists
    from (select cast(u.user_id as json)   user_id,
                 cast(max(u.name) as json) name,
                 cast(array_agg(
                         map(array['todo_list_id', 'title', 'todos'],
                             array[cast(tl.todo_list_id as json),
                             cast(tl.title as json),
                             cast(
                                     (select array_agg(
                                                     map(array['todo_id', 'title', 'todo_items'],
                                                         array[cast(t.todo_id as json),
                                                         cast(t.title as json),
                                                         cast(
                                                                 (select array_agg(
                                                                                 map(array['todo_item_id', 'title'],
                                                                                     array[cast(ti.todo_item_id as json),
                                                                                     cast(ti.title as json) ]))
                                                                  from todo_items ti
                                                                  where ti.todo_id = t.todo_id)
                                                             as json) ]))
                                      from todos t
                                      where t.todo_list_id = tl.todo_list_id)
                                 as json) ]))
                     as json)              todo_lists
          from users u
                   join todo_lists tl on tl.user_id = u.user_id
          group by u.user_id) t
    ) t;

  • Trino v371 trino query failure

  • Athena v2 (Presto v0.217) athena query failure

I found similar questions, but none with a definitive answer:


Solution

  • To update this:

    • The creator of Presto/Trino said that this behavior is not intentional and is a bug in the engine:

    enter image description here

    The issue is here: https://github.com/trinodb/trino/issues/14199

    There is an alternate way of writing queries that doesn't require correlation I have attached in the issues thread as a reply.