Search code examples
sqlmysqlperformancequery-optimization

Is it possible to rewrite a SUM-GROUP-BY subquery into a single query? It's a shot on better performance


I'd like to optimize the following query. I've noticed that I can reduce the time quite a bit by removing the subquery from billing_payment_applied in the LEFT-JOIN clause. My question is whether it is possible to do somehow rewrite it without this subquery.

SELECT a, b, c, ...
  FROM billing_invoice_item 
  LEFT JOIN billing_statement
  LEFT JOIN 
    (SELECT SUM(x), invoice_id 
       FROM billing_payment_applied
       WHERE invoice_id != 0  AND (apply_date <= now())
       GROUP BY invoice_id) AS BAL
  WHERE [...]

It seems to me that this would be possible if we can somehow put a condition on SUM().

What the query above is doing is getting the SUM() of invoices for each record previously collected. I'm thinking of that as a function call. Instead of making various function calls for each record, let's get all the data that the function call needs and do the SUM() later. I'm going to need to be able to identify which values come from billing_payment_applied and which do not, but that's possible because a LEFT JOIN will produce NULL values when no records were found. Also, SUM() will need to be able to sum only those that come from billing_payment_applied, but that also seems possible with a CASE-WHEN statement.

So, I wrote

SELECT SUM(CASE WHEN BAL.id is NULL THEN 0 ELSE BAL.amount), ...
  FROM billing_invoice_item 
  LEFT JOIN billing_statement
  LEFT JOIN `billing_payments_applied` bpa on 
            bpa.invoice_id = stm.id 
        AND bpa.invoice_id != 0 
        AND (`apply_date` IS NOT NULL AND `apply_date` <= now())
  WHERE [...]
  GROUP BY bpa.invoice_id

But this produces a lot less records than the original query, so this can't be right. What I thought it was equivalent isn't really. Where is my misunderstanding?

Original query. In case it helps, here's the original query.

SELECT
    `item`.`id` AS `id`,
                                 `inv`.`member_id` AS `member_id`,
                                 `item`.`sku` AS `sku`,
                                 `item`.`item_name` AS `item_name`,
                                 `item`.`price_ext` AS `item_price`,                                 
                                 ROUND( item.`price_ext` - ( item.`price_ext` * ( ( IFNULL( invbal.`amount_applied`, 0 ) * 100 ) / IFNULL( stm.`amount`, 0 ) ) ) / 100, 2 ) AS `item_price_balance`,
                                 `inv`.`invoice_number` AS `invoice_number`,
                                 `inv`.`invoice_date` AS `invoice_date`,
                                 `inv`.`due_date` AS `due_date`,
                                 CASE
                                    WHEN diritem.`type` = 'dues_membership_levels' THEN 'dues'
        WHEN diritem.`type` = 'hidden_commerce_dues' THEN 'dues'
        WHEN diritem.`type` = 'hidden_commerce_events' THEN 'events'
        WHEN diritem.`type` = 'hidden_finance_discounts' THEN 'finance'
        WHEN diritem.`type` = 'programs_course_products' THEN 'programs'
        WHEN diritem.`type` != '' THEN 'commerce'
        ELSE 'billing'
    END AS `origin`,
                                 IFNULL( diritem.`display_name`, `item`.`sku` ) AS `title`,
                                 diritem.`category` AS `category`,
                                 CONCAT( 'product_' , IFNULL( diritem.`id`, `item`.`sku` ) ) AS `product_id`,
                                 '' AS `start_date`,
                                 '' AS `ticket_name`,
                                 '' AS `ticket_type`
FROM
    `billing_invoice_item` AS `item`
INNER JOIN `billing_invoice` AS `inv` ON
    item.`invoice_id` = inv.`id`
    AND inv.`invoice_type` = ''
LEFT JOIN `billing_statement` AS `stm` ON
    stm.`invoice_number` = inv.`invoice_number`
    AND stm.`trans_id` = inv.`id`
LEFT JOIN (
    SELECT
        SUM( `amount` * -1 ) AS `amount_applied`,
        `invoice_id`
    FROM
        `billing_payments_applied`
    WHERE
        `invoice_id` != 0
        AND ( `apply_date` IS NOT NULL
            AND `apply_date` <= '2023-05-02 23:59:59' )
    GROUP BY
        `invoice_id`) AS invbal ON
    invbal.`invoice_id` = stm.`id`
LEFT JOIN `shopping_sku` sku ON
    sku.`sku` = item.`sku`
    AND sku.`origin` = 'directory'
LEFT JOIN `directory_items` diritem ON
    sku.`product_id` = diritem.`id`
WHERE
    item.`sku` NOT LIKE 'events_%'
    AND IF( inv.`due_date` < IFNULL( stm.`trans_date`, inv.`invoice_date` )
                                ,
    inv.`due_date`
                                ,
    IFNULL( stm.`trans_date`, inv.`invoice_date` ) ) <= '2023-05-02 23:59:59'


Solution

  • Are you sure about LEFT? That says that data in the "right" table could be missing, and you are willing to get NULLs in place of the SUM, etc. (And you might want to do COALESCE(...) to turn it into 0 in the outer query.)

    What you have should make one pass over billing_payment_applied to get all the available SUMs (after filtering by WHERE). This index may help:

    INDEX(invoice_id, apply_date, x)
    

    Another way to write the query, but not necessarily any faster in your case:

    SELECT  ...
            COALESCE( ( SELECT SUM(x) ... WHERE ... ), 0) AS bal,   -- No GROUP BY
        WHERE ...;
    

    (Without knowing which columns are in which tables, I don't see how to include billing_statement. Please qualify column names with the tables they are in.)