Search code examples
sqlpostgresqlquery-performance

Finding cumulative totals with condition and group by


Here is my case:

I would like to calculate quantity and price for a given item on any given date.

Prices are calculated using total item quantity and unit price so price changes with respect to item's quantity.

warehouse_1 states that item was shipped from that warehouse, warehouse_2 states that item was sent to that warehouse.

Here is my logic:

Fetch deliveries for each item and sum their quantities. (1st CTE)

Find the sum of quantities in both warehouses separately. (2nd CTE)

Calculate final quantity and multiply it by unit price.

Show result which consists of item id, quantity and price.

I wrote a query which does the calculations correctly BUT it gets exponentially slower when data count gets bigger. (Takes 5 seconds on my DB with 6k rows, almost locks DB on my coworker's DB with 21k rows)

How can I optimize this query? I am doing cumulative calculations on 2nd CTE for each row coming from 1st CTE and that needs a rework I believe.

Can I use LAG() function for this use case? I tried that with something like

LAG(a.deliveryTotal) over(order by a.updated desc rows between unbounded preceding and current row)

instead of the CASE block in 2nd CTE but I can't seem to figure out how to use filter() or put a condition inside LAG() statement.

Here is my query:

`

with deliveriesCTE as (
select
    row_number() over(partition by it.id
order by
    dd.updated asc) as rn,
    sum(dd.quantity) as deliveryTotal,
    dd.updated as updated,
    it.id as item_id,
    d.warehouse_1 as outWH,
    d.warehouse_2 as inWH,
    d.company_code as company
from
    deliveries d
join deliveries_detail dd on
    dd.deliveries_id = d.id
join items it on
    it.id = dd.item_id
where
    ...
group by
    dd.updated,
    it.id,
    d.warehouse_1,
    d.warehouse_2,
    d.company_code
order by
    dd.updated asc),
cumulativeTotalsByUnit as (
select
    distinct on
    (a.item_id) a.rn,
    a.deliveryTotal,
    a.updated,
    a.item_id,
    a.outWH,
    a.inWH,
    a.company,
    case
        when a.rn = 1
        and a.outWH is not null then coalesce(a.deliveryTotal,
        0)
        else (
        select
            coalesce(sum(b.deliveryTotal) filter(
            where b.outWH is not null),
            0)
        from
            deliveriesCTE b
        where
            a.item_id = b.item_id
            and b.rn <= a.rn)
    end as outWHTotal,
    case
        when a.rn = 1
        and a.inWH is not null then coalesce(a.deliveryTotal,
        0)
        else (
        select
            coalesce(sum(b.deliveryTotal) filter(
            where b.inWH is not null),
            0)
        from
            deliveriesCTE b
        where
            a.item_id = b.item_id
            and b.rn <= a.rn)
    end as inWHTotal
from
    deliveriesCTE a
order by
    a.item_id,
    a.updated desc)
select
    resultView.item_id,
    resultView.quantity,
    resultView.price
from
    (
    select
        cumTotals.item_id,
        cumTotals.inWHTotal - cumTotals.outWHTotal as quantity,
        p.price * (cumTotals.inWHTotal - cumTotals.outWHTotal) as price
    from
        prices p
    join cumulativeTotalsByUnit cumTotals on
        cumTotals.item_id = p.item_id ) resultView
where
    resultView.rn = 1;

`


Solution

  • It's hard to say for use without a MCV, but my guess on what you are trying to do is do a Windowed SUM() calculation as opposed to LAG(). There is documentation Here.

    The query cumulativeTotalsByUnit shouldn't be necessary and is likely quadratic to do the complex self-referential join.

    Your delivery CTE should look like:

    select
        sum(dd.quantity) over (partition by it.id ORDER BY dd.updated asc) as deliveryTotal,
        dd.updated as updated,
        it.id as item_id,
        d.warehouse_1 as outWH,
        d.warehouse_2 as inWH,
        d.company_code as company
    from
        deliveries d
    join deliveries_detail dd on
        dd.deliveries_id = d.id
    join items it on
        it.id = dd.item_id
    where
        ...
    group by
        dd.updated,
        it.id,
        d.warehouse_1,
        d.warehouse_2,
        d.company_code
    order by
        dd.updated asc