Search code examples
mysqlselectsubqueryquery-optimizationaggregate-functions

Using two aggregate values from a SELECT subquery in a single outer SELECT query


For relevant tables and columns (a lot more exist than the following), I have a customer table with cust_id and state columns and an account table with account_id, cust_id, and avail_balance columns.

Example customer table:

| cust_id | state |
|--------:|-------|
|       1 | MA    |
|       2 | MA    |
|       3 | MA    |
|       4 | MA    |
|       5 | NH    |
|       6 | MA    |
|       7 | MA    |
|       8 | NH    |
|       9 | MA    |
|      10 | NH    |
|      11 | MA    |
|      12 | NH    |
|      13 | MA    |

Example account table:

| account_id | cust_id | avail_balance |
|-----------:|--------:|--------------:|
|          1 |       1 |       1057.75 |
|          2 |       1 |           500 |
|          3 |       1 |          3000 |
|          4 |       2 |       2258.02 |
|          5 |       2 |           200 |
|          7 |       3 |       1057.75 |
|          8 |       3 |        2212.5 |
|         10 |       4 |        534.12 |
|         11 |       4 |        767.77 |
|         12 |       4 |       5487.09 |
|         13 |       5 |       2237.97 |
|         14 |       6 |        122.37 |
|         15 |       6 |         10000 |
|         17 |       7 |          5000 |
|         18 |       8 |       3487.19 |
|         19 |       8 |        387.99 |
|         21 |       9 |        125.67 |
|         22 |       9 |       9345.55 |
|         23 |       9 |          1500 |
|         24 |      10 |      23575.12 |
|         25 |      10 |             0 |
|         27 |      11 |       9345.55 |
|         28 |      12 |      38552.05 |
|         29 |      13 |         50000 |

Here is an example of a query that I wish to optimize.

SELECT account.cust_id
FROM account, customer
WHERE
      account.cust_id = customer.cust_id
  AND customer.state = 'MA'
  AND customer.cust_id >
      (SELECT MAX(customer.cust_id) AS max_nh_cust_id
       FROM account, customer
       WHERE
             account.cust_id = customer.cust_id
         AND state = 'NH')
  AND account.avail_balance >
      (SELECT MAX(avail_balance) AS max_nh_avail_balance
       FROM account, customer
       WHERE
             account.cust_id = customer.cust_id
         AND state = 'NH');

Expected result for the above example data (may not be just 1 result if data is different):

| cust_id |
|--------:|
|      13 |

The problem with the above is the code redundancy and how we have to iterate multiple times through the account table with the two subqueries. I was hoping that it would be possible to use one subquery to get the maximum cust_id and avail_balance from the account table and use those two scalars in the outer query. For example, the shape of the query might look something like this:

SELECT account.cust_id
FROM account, customer
WHERE
      account.cust_id = customer.cust_id
  AND (customer.cust_id > max_nh_cust_id AND account.avail_balance) > max_nh_avail_balance) IN
      (SELECT MAX(customer.cust_id) AS max_nh_cust_id, MAX(avail_balance) AS max_nh_avail_balance
       FROM account, customer
       WHERE
             account.cust_id = customer.cust_id
         AND state = 'NH');

Obviously, the above does not work. What is the most efficient way of implementing something like the above with less code redundancy and only in one query (that may contain subqueries)?


Solution

  • You can merge the 2 subqueries into 1:

    SELECT MAX(c.cust_id) AS max_nh_cust_id,
           MAX(a.avail_balance) AS max_nh_avail_balance 
    FROM account a INNER JOIN customer c 
    ON a.cust_id = c.cust_id 
    WHERE c.state = 'NH'
    

    and join it like this:

    SELECT a.cust_id
    FROM account a 
    INNER JOIN customer c ON a.cust_id = c.cust_id
    INNER JOIN (
      SELECT MAX(c.cust_id) AS max_nh_cust_id,
             MAX(a.avail_balance) AS max_nh_avail_balance 
      FROM account a INNER JOIN customer c 
      ON a.cust_id = c.cust_id 
      WHERE c.state = 'NH'
    ) t ON c.cust_id > t.max_nh_cust_id AND a.avail_balance > t.max_nh_avail_balance
    WHERE c.state = 'MA'
    

    See the demo.
    Results:

    > | cust_id |
    > | ------: |
    > |      13 |