I'd like to optimize the following query. I've noticed that I can reduce the time quite a bit by removing the subquery from billing_payment_applied
in the LEFT-JOIN clause. My question is whether it is possible to do somehow rewrite it without this subquery.
SELECT a, b, c, ...
FROM billing_invoice_item
LEFT JOIN billing_statement
LEFT JOIN
(SELECT SUM(x), invoice_id
FROM billing_payment_applied
WHERE invoice_id != 0 AND (apply_date <= now())
GROUP BY invoice_id) AS BAL
WHERE [...]
It seems to me that this would be possible if we can somehow put a condition on SUM().
What the query above is doing is getting the SUM() of invoices for each record previously collected. I'm thinking of that as a function call. Instead of making various function calls for each record, let's get all the data that the function call needs and do the SUM() later. I'm going to need to be able to identify which values come from billing_payment_applied and which do not, but that's possible because a LEFT JOIN will produce NULL values when no records were found. Also, SUM() will need to be able to sum only those that come from billing_payment_applied, but that also seems possible with a CASE-WHEN statement.
So, I wrote
SELECT SUM(CASE WHEN BAL.id is NULL THEN 0 ELSE BAL.amount), ...
FROM billing_invoice_item
LEFT JOIN billing_statement
LEFT JOIN `billing_payments_applied` bpa on
bpa.invoice_id = stm.id
AND bpa.invoice_id != 0
AND (`apply_date` IS NOT NULL AND `apply_date` <= now())
WHERE [...]
GROUP BY bpa.invoice_id
But this produces a lot less records than the original query, so this can't be right. What I thought it was equivalent isn't really. Where is my misunderstanding?
Original query. In case it helps, here's the original query.
SELECT
`item`.`id` AS `id`,
`inv`.`member_id` AS `member_id`,
`item`.`sku` AS `sku`,
`item`.`item_name` AS `item_name`,
`item`.`price_ext` AS `item_price`,
ROUND( item.`price_ext` - ( item.`price_ext` * ( ( IFNULL( invbal.`amount_applied`, 0 ) * 100 ) / IFNULL( stm.`amount`, 0 ) ) ) / 100, 2 ) AS `item_price_balance`,
`inv`.`invoice_number` AS `invoice_number`,
`inv`.`invoice_date` AS `invoice_date`,
`inv`.`due_date` AS `due_date`,
CASE
WHEN diritem.`type` = 'dues_membership_levels' THEN 'dues'
WHEN diritem.`type` = 'hidden_commerce_dues' THEN 'dues'
WHEN diritem.`type` = 'hidden_commerce_events' THEN 'events'
WHEN diritem.`type` = 'hidden_finance_discounts' THEN 'finance'
WHEN diritem.`type` = 'programs_course_products' THEN 'programs'
WHEN diritem.`type` != '' THEN 'commerce'
ELSE 'billing'
END AS `origin`,
IFNULL( diritem.`display_name`, `item`.`sku` ) AS `title`,
diritem.`category` AS `category`,
CONCAT( 'product_' , IFNULL( diritem.`id`, `item`.`sku` ) ) AS `product_id`,
'' AS `start_date`,
'' AS `ticket_name`,
'' AS `ticket_type`
FROM
`billing_invoice_item` AS `item`
INNER JOIN `billing_invoice` AS `inv` ON
item.`invoice_id` = inv.`id`
AND inv.`invoice_type` = ''
LEFT JOIN `billing_statement` AS `stm` ON
stm.`invoice_number` = inv.`invoice_number`
AND stm.`trans_id` = inv.`id`
LEFT JOIN (
SELECT
SUM( `amount` * -1 ) AS `amount_applied`,
`invoice_id`
FROM
`billing_payments_applied`
WHERE
`invoice_id` != 0
AND ( `apply_date` IS NOT NULL
AND `apply_date` <= '2023-05-02 23:59:59' )
GROUP BY
`invoice_id`) AS invbal ON
invbal.`invoice_id` = stm.`id`
LEFT JOIN `shopping_sku` sku ON
sku.`sku` = item.`sku`
AND sku.`origin` = 'directory'
LEFT JOIN `directory_items` diritem ON
sku.`product_id` = diritem.`id`
WHERE
item.`sku` NOT LIKE 'events_%'
AND IF( inv.`due_date` < IFNULL( stm.`trans_date`, inv.`invoice_date` )
,
inv.`due_date`
,
IFNULL( stm.`trans_date`, inv.`invoice_date` ) ) <= '2023-05-02 23:59:59'
Are you sure about LEFT? That says that data in the "right" table could be missing, and you are willing to get NULLs in place of the SUM, etc. (And you might want to do COALESCE(...)
to turn it into 0
in the outer query.)
What you have should make one pass over billing_payment_applied
to get all the available SUMs (after filtering by WHERE
). This index may help:
INDEX(invoice_id, apply_date, x)
Another way to write the query, but not necessarily any faster in your case:
SELECT ...
COALESCE( ( SELECT SUM(x) ... WHERE ... ), 0) AS bal, -- No GROUP BY
WHERE ...;
(Without knowing which columns are in which tables, I don't see how to include billing_statement
. Please qualify column names with the tables they are in.)