Search code examples
sqloptimizationquery-optimization

How to re-write SQL query to be more efficient?


I've got a query that's decently sized on it's own, but there's one section of it that turns it into something ridiculously large (billions of rows returned type thing).

There must be a better way to write it than what I have done.

To simplify the section of the query in question, it takes the client details from one table and tries to find the most recent transaction dates in their savings and spending accounts (not the actual situation, but close enough).

I've joined it with left joins because if someone (for example) doesn't have a savings account, I still want the client details to pop up. But when there's hundreds of thousands of clients with tens of thousands of transactions, it's a little slow to run.

select client_id, max(e.transation_date), max(s.transaction_date)
from client_table c
    left join everyday_account e
        on c.client_id = e.client_id
    left join savings_account s
        on c.client_id = s.client_id
group by client_id

I'm still new to this so I'm not great at knowing how to optimise things, so is there any thing I should be looking at? Perhaps different joins, or something other than max()?

I've probably missed some key details while trying to simplify it, let me know if so!


Solution

  • Sometimes aggregating first, then joining to the aggregated result is faster. But this depends on the actual DBMS being used and several other factors.

    select client_id, e.max_everyday_transaction_date, s.max_savings_transaction_date
    from client_table c
      left join (
        select client_id, max(transaction_date) as max_everyday_transaction_date
        from everyday_account 
        group by client_id
      ) e on c.client_id = e.client_id
      left join (
        select client_id, max(transaction_date) as max_savings_transaction_date
        from savings_account
      ) s on c.client_id = s.client_id
    

    The indexes suggested by Tim Biegeleisen should help in this case as well.

    But as the query has to process all rows from all tables there no good way to speed up this query, other than throwing more hardware at it. If your database supports it, make sure parallel query is enabled (which will distribute the total work over multiple threads in the backend which can substantially improve query performance if the I/O system can keep up)