I was trying to write a query that resulted the running sum of given values. However, when applied SUM as analytic function I got result with average within a window.
Example: Consider the following query:
with tbl as
(
select 'steve' "NAME", 2000 val from dual UNION ALL
select 'john' "NAME", 4000 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 90 val from dual UNION ALL
select 'peter' "NAME", 450 val from dual UNION ALL
select 'hary' "NAME", 2772 val from dual UNION ALL
select 'may' "NAME", 2227.5 val from dual UNION ALL
select 'tom' "NAME", 500 val from dual UNION ALL
select 'sia' "NAME", 20000 val from dual
)
select name, val,
sum(val) over (order by name) running_sum
from tbl;
Result:
What I actually wanted was
Which I got using ROWNUM:
with tbl as
(
select 'steve' "NAME", 2000 val from dual UNION ALL
select 'john' "NAME", 4000 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 90 val from dual UNION ALL
select 'peter' "NAME", 450 val from dual UNION ALL
select 'hary' "NAME", 2772 val from dual UNION ALL
select 'may' "NAME", 2227.5 val from dual UNION ALL
select 'tom' "NAME", 500 val from dual UNION ALL
select 'sia' "NAME", 20000 val from dual
)
select name, val,
sum(val) over (order by rownum) running_sum
from tbl;
The running_sum for peter that is being displayed in the first result is actually the average of the total running_sum for peter. The analytic function is considering a window for peter as I've included "NAME" in windowing clause. But why is the query resulting in average for the window instead of running sum?
why is the query resulting in average for the window instead of running sum?
It isn't an average for the window. In your rownum
version the four values it shows are 6270, 6540, 6630 and 7080 - which average out to 6630.
It is the running sum, but perhaps not of quite what you expected, and the order the output is being shown is obscuring the logic you're actually applying a bit.
You can see where the numbers you are seeing are coming from by ordering the output:
with tbl as
(
select 'steve' "NAME", 2000 val from dual UNION ALL
select 'john' "NAME", 4000 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 90 val from dual UNION ALL
select 'peter' "NAME", 450 val from dual UNION ALL
select 'hary' "NAME", 2772 val from dual UNION ALL
select 'may' "NAME", 2227.5 val from dual UNION ALL
select 'tom' "NAME", 500 val from dual UNION ALL
select 'sia' "NAME", 20000 val from dual
)
select name, val,
sum(val) over (order by name) running_sum
from tbl
order by name;
NAME VAL RUNNING_SUM
----- ---------- -----------
hary 2772 2772
john 4000 6772
may 2227.5 8999.5
peter 450 10079.5
peter 270 10079.5
peter 270 10079.5
peter 90 10079.5
sia 20000 30079.5
steve 2000 32079.5
tom 500 32579.5
You can see that the running totals now make sense, from the order they are being evaluated by your windowing clause. All four values for peter
are included in the running total for each of those lines - because that's all that's in the order by
- and that total of 450+270+270+90=1080 is being added to the previous name's total of 8999.5.
You could get different values for each peter
row by including a row-based window clause:
with tbl as
(
select 'steve' "NAME", 2000 val from dual UNION ALL
select 'john' "NAME", 4000 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 270 val from dual UNION ALL
select 'peter' "NAME", 90 val from dual UNION ALL
select 'peter' "NAME", 450 val from dual UNION ALL
select 'hary' "NAME", 2772 val from dual UNION ALL
select 'may' "NAME", 2227.5 val from dual UNION ALL
select 'tom' "NAME", 500 val from dual UNION ALL
select 'sia' "NAME", 20000 val from dual
)
select name, val,
sum(val) over (order by name
rows between unbounded preceding and current row) running_sum
from tbl;
NAME VAL RUNNING_SUM
----- ---------- -----------
hary 2772 2772
john 4000 6772
may 2227.5 8999.5
peter 450 9449.5
peter 270 9719.5
peter 270 9989.5
peter 90 10079.5
sia 20000 30079.5
steve 2000 32079.5
tom 500 32579.5
The order that the rows for the same name are evaluated is still indeterminate, since there is no instruction in that analytic clause on how to break ties.
The entire result is now implicitly ordered (at least today, using a CTE and with my version and my optimiser's decisions), which might not be what you want; but then you should have an explicit order by
anyway if the order does matter to you, whatever it is.