Search code examples
sqlamazon-athenaprestotrino

Transform correlated subquery to be used with presto / athena


I have a mysql query which contains a correlated subquery. This is not supported within presto which is why I am looking for a way to have the same functionality in ansi sql syntax. The query (from this post) is as follows:

SELECT tA.uid, tA.dt, tA.val_A,
       AVG(val_B) AS val_C
   FROM 
(SELECT uid, dt, val_A,
       (SELECT dt FROM tableA ta1 
         WHERE ta1.uid=ta2.uid 
          AND ta1.dt > ta2.dt LIMIT 1) AS dtRg
  FROM tableA ta2) tA
LEFT JOIN tableB tB 
  ON tA.uid=tB.uid
AND tB.dt >= tA.dt
AND tB.dt < tA.dtRg
GROUP BY tA.uid, tA.dt, tA.val_A;

If I run the query in the presto db it throws following error message:

[Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 5:9: Given correlated subquery is not supported [Execution ID: 60a6f7e2-fb2b-44e0-a4f2-847ac669e905]

How can I fix this?


Solution

  • If you are ok with >= logic and/or dt's are unique within the same uid you can leverage window functions:

    (SELECT uid,
            dt,
            val_A,
            lead(dt) over (partition by uid order by dt) AS dtRg
    FROM tableA ta2) tA
    

    Otherwise I'm afraid you are limited to performing join and selecting first result from it - for expiration check out this answer (or using row_number window function).