Search code examples
sqloracleanalytic-functions

Analytic Function SUM returns Average for window with same values


I was trying to write a query that resulted the running sum of given values. However, when applied SUM as analytic function I got result with average within a window.

Example: Consider the following query:

with tbl as 
(
select 'steve' "NAME", 2000 val from dual UNION ALL
select 'john' "NAME", 4000 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 90 val from dual UNION ALL
select 'peter' "NAME", 450 val from dual UNION ALL
select 'hary' "NAME", 2772 val from dual UNION ALL
select 'may' "NAME", 2227.5 val from dual UNION ALL
select 'tom' "NAME", 500 val from dual UNION ALL
select 'sia' "NAME", 20000 val from dual
)
select name, val,
        sum(val) over (order by name) running_sum
from tbl;

Result:

enter image description here

What I actually wanted was

enter image description here

Which I got using ROWNUM:

with tbl as 
(
select 'steve' "NAME", 2000 val from dual UNION ALL
select 'john' "NAME", 4000 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 90 val from dual UNION ALL
select 'peter' "NAME", 450 val from dual UNION ALL
select 'hary' "NAME", 2772 val from dual UNION ALL
select 'may' "NAME", 2227.5 val from dual UNION ALL
select 'tom' "NAME", 500 val from dual UNION ALL
select 'sia' "NAME", 20000 val from dual
)
select name, val,
        sum(val) over (order by rownum) running_sum
from tbl;

The running_sum for peter that is being displayed in the first result is actually the average of the total running_sum for peter. The analytic function is considering a window for peter as I've included "NAME" in windowing clause. But why is the query resulting in average for the window instead of running sum?


Solution

  • why is the query resulting in average for the window instead of running sum?

    It isn't an average for the window. In your rownum version the four values it shows are 6270, 6540, 6630 and 7080 - which average out to 6630.

    It is the running sum, but perhaps not of quite what you expected, and the order the output is being shown is obscuring the logic you're actually applying a bit.

    You can see where the numbers you are seeing are coming from by ordering the output:

    with tbl as 
    (
    select 'steve' "NAME", 2000 val from dual UNION ALL
    select 'john' "NAME", 4000 val from dual UNION ALL
    select 'peter' "NAME", 270 val from dual UNION ALL
    select 'peter' "NAME", 270 val from dual UNION ALL
    select 'peter' "NAME", 90 val from dual UNION ALL
    select 'peter' "NAME", 450 val from dual UNION ALL
    select 'hary' "NAME", 2772 val from dual UNION ALL
    select 'may' "NAME", 2227.5 val from dual UNION ALL
    select 'tom' "NAME", 500 val from dual UNION ALL
    select 'sia' "NAME", 20000 val from dual
    )
    select name, val,
            sum(val) over (order by name) running_sum
    from tbl
    order by name;
    
    NAME         VAL RUNNING_SUM
    ----- ---------- -----------
    hary        2772        2772
    john        4000        6772
    may       2227.5      8999.5
    peter        450     10079.5
    peter        270     10079.5
    peter        270     10079.5
    peter         90     10079.5
    sia        20000     30079.5
    steve       2000     32079.5
    tom          500     32579.5
    

    You can see that the running totals now make sense, from the order they are being evaluated by your windowing clause. All four values for peter are included in the running total for each of those lines - because that's all that's in the order by - and that total of 450+270+270+90=1080 is being added to the previous name's total of 8999.5.

    You could get different values for each peter row by including a row-based window clause:

    with tbl as 
    (
    select 'steve' "NAME", 2000 val from dual UNION ALL
    select 'john' "NAME", 4000 val from dual UNION ALL
    select 'peter' "NAME", 270 val from dual UNION ALL
    select 'peter' "NAME", 270 val from dual UNION ALL
    select 'peter' "NAME", 90 val from dual UNION ALL
    select 'peter' "NAME", 450 val from dual UNION ALL
    select 'hary' "NAME", 2772 val from dual UNION ALL
    select 'may' "NAME", 2227.5 val from dual UNION ALL
    select 'tom' "NAME", 500 val from dual UNION ALL
    select 'sia' "NAME", 20000 val from dual
    )
    select name, val,
            sum(val) over (order by name
              rows between unbounded preceding and current row) running_sum
    from tbl;
    
    NAME         VAL RUNNING_SUM
    ----- ---------- -----------
    hary        2772        2772
    john        4000        6772
    may       2227.5      8999.5
    peter        450      9449.5
    peter        270      9719.5
    peter        270      9989.5
    peter         90     10079.5
    sia        20000     30079.5
    steve       2000     32079.5
    tom          500     32579.5
    

    The order that the rows for the same name are evaluated is still indeterminate, since there is no instruction in that analytic clause on how to break ties.

    The entire result is now implicitly ordered (at least today, using a CTE and with my version and my optimiser's decisions), which might not be what you want; but then you should have an explicit order by anyway if the order does matter to you, whatever it is.