I have a (Sybase) table that has the following information:
order_id int
timestamp datetime
action char(1) --i=inserted, c=corrected, r=removed
shares int
It keeps track of the shares associated with an order (identified by its order_id) in a system. Using an example, the life of an order is as follows:
timestamp action shares
10:00:00 i 1000 -- initial Insert
10:06:30 c 900 -- one Change
10:07:12 c 800
10:50:20 r 800 -- Removal
11:10:10 i 600 -- 2nd Insert
11:12:10 r 600
In the example above, the order is active from 10:00:00 and 10:50:20, and again from 11:10:10 and 11:12:10
I have 1000s of such orders in the system and I need to plot with a histogram how many shares are active in a time series divided into bins/buckets of 5 minutes. If the number of shares for a given order changes more than once within the samenter code heree bin, I need to average the shares; as in the example above in the 10:05-10:10 bin where 1000, 900 and 800 can be avg'd out as 900.
Here's a more complex example:
1, "20140828 10:00:00", "i", 1000
1, "20140828 10:06:00", "c", 900
1, "20140828 10:07:12", "c", 500
1, "20140828 10:10:10", "c", 400
1, "20140828 10:20:20", "r", 400
1, "20140828 10:30:10", "i", 300
1, "20140828 10:32:10", "r", 300
2, "20140828 09:51:00", "i", 500
2, "20140828 10:08:30", "r", 500
3, "20140828 10:10:00", "i", 1000
3, "20140828 10:11:20", "r", 1000
with its expected output:
10:00:00 1500
10:05:00 1300
10:10:00 1450
10:15:00 400
10:20:00 400
10:25:00 0
10:30:00 300
10:35:00 0
10:40:00 0
10:45:00 0
10:50:00 0
10:55:00 0
Thanks in advance for your help.
This is a variation on the Running Sum problem in SQL Server (either MS or Sybase, due to shared history), grouped by the bucket ID, which can simply be the time difference in minutes from a base time integer-divided by 5. So something like this will do:
create table #t(
BucketNo int not null primary key clustered,
Activity int not null,
Active int not null
);
-- pre-aggregate activity data
-- assumes prior existence of a zero-based NUMBERS or TALLY table
insert #t(BucketNo,Activity,Active)
select
N
,isnull(Activity,0)
,0
from NUMBERS
left join (
select
datediff(mm,0,TimeStamp) / 5 as BucketNo
,case action when 'i' then +1
'r' then -1
end * shares as Activity
,0 as Active
from ActivityTable
where action <> 'c'
group by datediff(mm,0,TimeStamp) / 5
union all
select
datediff(mm,0,TimeStamp) / 5 as BucketNo
,case action when 'i' then +1
'r' then -1
end * shares
- ( select top 1 i.shares
from ActivityTable i
where i.order_id = c.order_id and i.TimeStamp > c.TimeStamp
order by i.TimeStamp desc
) as Activity
,0 as Active
from ActivityTable as c
where c.action = 'c
group by datediff(mm,0,TimeStamp) / 5
) data on data.BucketNo = N
where N < 24 * 12; -- 5 minute buckets per day
Now we use the SQL Server quirky update to process #t in clustered-index order to perform the Running Sum.
declare @Shares int = 0,
@BucketNo int = 0;
-- `quirky update` peculiar to SQL Server
update #t
set @Shares = Shares
= case when BucketNo = @BucketNo
then @Shares + Activity
else 0
end,
@BucketNo = BucketNo
from #t with (TABLOCKX) -- not strictly necessary when using a temp table.
option (MAXDOP 1); -- prevent parallelization of query
select BucketNo, Active from #t order by BucketNo
go