Search code examples
hiveprestocorrelated-subquerytrino

Workaround for a correlated subquery


I need to run the following join without using a correlated subquery, as I am restricted to either using Hive or Presto, both of which fail due to my using a correlated subquery.

I have worked this down to a MWE. I have a table of each user and their 18th birthdays. I have another table of each time each user visited a movie theatre. I want to merge in only the last time a user visited my movie cinema. The code that would work on native SQL is below.

What is the most efficient workaround that does not require me to join every instance of the user visiting the movie theatre (it is far too large).

SELECT
    people.*,
    tickets.uid
    tickets.date
FROM all_customers as people
JOIN tkting as tickets
    on people.uid = tickets.uid
    and tickets.date = (select
                            lastvisit.date
                        from tickets as lastvisit
                        where
                            lastvisit.uid = people.uid
                            and lastvisit.date < people.birthday_18
                        order by lastvisit.date asc
                        limit 1)

Solution

  • Instead of this inner query:

    SELECT lastvisit.date
    ...
    ORDER BY lastvisit.date ASC
    LIMIT 1
    

    you can try with:

    SELECT min(lastvisit.date)
    ...