For relevant tables and columns (a lot more exist than the following), I have a customer
table with cust_id
and state
columns and an account
table with account_id
, cust_id
, and avail_balance
columns.
Example customer
table:
| cust_id | state |
|--------:|-------|
| 1 | MA |
| 2 | MA |
| 3 | MA |
| 4 | MA |
| 5 | NH |
| 6 | MA |
| 7 | MA |
| 8 | NH |
| 9 | MA |
| 10 | NH |
| 11 | MA |
| 12 | NH |
| 13 | MA |
Example account
table:
| account_id | cust_id | avail_balance |
|-----------:|--------:|--------------:|
| 1 | 1 | 1057.75 |
| 2 | 1 | 500 |
| 3 | 1 | 3000 |
| 4 | 2 | 2258.02 |
| 5 | 2 | 200 |
| 7 | 3 | 1057.75 |
| 8 | 3 | 2212.5 |
| 10 | 4 | 534.12 |
| 11 | 4 | 767.77 |
| 12 | 4 | 5487.09 |
| 13 | 5 | 2237.97 |
| 14 | 6 | 122.37 |
| 15 | 6 | 10000 |
| 17 | 7 | 5000 |
| 18 | 8 | 3487.19 |
| 19 | 8 | 387.99 |
| 21 | 9 | 125.67 |
| 22 | 9 | 9345.55 |
| 23 | 9 | 1500 |
| 24 | 10 | 23575.12 |
| 25 | 10 | 0 |
| 27 | 11 | 9345.55 |
| 28 | 12 | 38552.05 |
| 29 | 13 | 50000 |
Here is an example of a query that I wish to optimize.
SELECT account.cust_id
FROM account, customer
WHERE
account.cust_id = customer.cust_id
AND customer.state = 'MA'
AND customer.cust_id >
(SELECT MAX(customer.cust_id) AS max_nh_cust_id
FROM account, customer
WHERE
account.cust_id = customer.cust_id
AND state = 'NH')
AND account.avail_balance >
(SELECT MAX(avail_balance) AS max_nh_avail_balance
FROM account, customer
WHERE
account.cust_id = customer.cust_id
AND state = 'NH');
Expected result for the above example data (may not be just 1 result if data is different):
| cust_id |
|--------:|
| 13 |
The problem with the above is the code redundancy and how we have to iterate multiple times through the account
table with the two subqueries. I was hoping that it would be possible to use one subquery to get the maximum cust_id
and avail_balance
from the account
table and use those two scalars in the outer query. For example, the shape of the query might look something like this:
SELECT account.cust_id
FROM account, customer
WHERE
account.cust_id = customer.cust_id
AND (customer.cust_id > max_nh_cust_id AND account.avail_balance) > max_nh_avail_balance) IN
(SELECT MAX(customer.cust_id) AS max_nh_cust_id, MAX(avail_balance) AS max_nh_avail_balance
FROM account, customer
WHERE
account.cust_id = customer.cust_id
AND state = 'NH');
Obviously, the above does not work. What is the most efficient way of implementing something like the above with less code redundancy and only in one query (that may contain subqueries)?
You can merge the 2 subqueries into 1:
SELECT MAX(c.cust_id) AS max_nh_cust_id,
MAX(a.avail_balance) AS max_nh_avail_balance
FROM account a INNER JOIN customer c
ON a.cust_id = c.cust_id
WHERE c.state = 'NH'
and join it like this:
SELECT a.cust_id
FROM account a
INNER JOIN customer c ON a.cust_id = c.cust_id
INNER JOIN (
SELECT MAX(c.cust_id) AS max_nh_cust_id,
MAX(a.avail_balance) AS max_nh_avail_balance
FROM account a INNER JOIN customer c
ON a.cust_id = c.cust_id
WHERE c.state = 'NH'
) t ON c.cust_id > t.max_nh_cust_id AND a.avail_balance > t.max_nh_avail_balance
WHERE c.state = 'MA'
See the demo.
Results:
> | cust_id |
> | ------: |
> | 13 |