Search code examples
sqlsap-ase

SQL: aggregate sizes into bins


I have a (Sybase) table that has the following information:

 order_id   int   
 timestamp  datetime   
 action     char(1)      --i=inserted, c=corrected, r=removed   
 shares      int

It keeps track of the shares associated with an order (identified by its order_id) in a system. Using an example, the life of an order is as follows:

  timestamp action  shares      
  10:00:00  i       1000     -- initial Insert    
  10:06:30  c       900      -- one Change    
  10:07:12  c       800    
  10:50:20  r       800      -- Removal    
  11:10:10  i       600      -- 2nd Insert    
  11:12:10  r       600

In the example above, the order is active from 10:00:00 and 10:50:20, and again from 11:10:10 and 11:12:10

I have 1000s of such orders in the system and I need to plot with a histogram how many shares are active in a time series divided into bins/buckets of 5 minutes. If the number of shares for a given order changes more than once within the samenter code heree bin, I need to average the shares; as in the example above in the 10:05-10:10 bin where 1000, 900 and 800 can be avg'd out as 900.

Here's a more complex example:

1, "20140828 10:00:00",  "i", 1000
1, "20140828 10:06:00",  "c",  900
1, "20140828 10:07:12",  "c",  500
1, "20140828 10:10:10",  "c",  400
1, "20140828 10:20:20",  "r",  400
1, "20140828 10:30:10",  "i",  300
1, "20140828 10:32:10",  "r",  300

2, "20140828 09:51:00",  "i",  500
2, "20140828 10:08:30",  "r",  500

3, "20140828 10:10:00",  "i", 1000
3, "20140828 10:11:20",  "r", 1000

with its expected output:

10:00:00 1500
10:05:00 1300
10:10:00 1450
10:15:00 400
10:20:00 400
10:25:00 0
10:30:00 300
10:35:00 0
10:40:00 0
10:45:00 0
10:50:00 0
10:55:00 0

Thanks in advance for your help.


Solution

  • This is a variation on the Running Sum problem in SQL Server (either MS or Sybase, due to shared history), grouped by the bucket ID, which can simply be the time difference in minutes from a base time integer-divided by 5. So something like this will do:

    create table #t(
        BucketNo    int not null primary key clustered,
        Activity    int not null,
        Active      int not null
    );
    
    -- pre-aggregate activity data
    -- assumes prior existence of a zero-based NUMBERS or TALLY table
    insert #t(BucketNo,Activity,Active) 
    select 
         N
        ,isnull(Activity,0)
        ,0
    from NUMBERS 
    left join (
        select
             datediff(mm,0,TimeStamp) / 5 as BucketNo
            ,case action when 'i' then +1
                              'r' then -1
             end * shares          as Activity
            ,0 as Active  
        from  ActivityTable
        where action <> 'c'
        group by            datediff(mm,0,TimeStamp) / 5 
    
        union all
    
        select
             datediff(mm,0,TimeStamp) / 5 as BucketNo
            ,case action when 'i' then +1
                              'r' then -1
             end * shares
             - (  select top 1 i.shares 
                  from ActivityTable i
                  where i.order_id = c.order_id  and  i.TimeStamp > c.TimeStamp
                  order by i.TimeStamp desc
               ) as Activity
            ,0 as Active  
        from ActivityTable as c
        where c.action  = 'c         
        group by            datediff(mm,0,TimeStamp) / 5 
    ) data on data.BucketNo = N
    where N < 24 * 12; -- 5 minute buckets per day
    

    Now we use the SQL Server quirky update to process #t in clustered-index order to perform the Running Sum.

    declare @Shares   int = 0,
            @BucketNo int = 0;
    
    -- `quirky update` peculiar to SQL Server
    update #t
       set @Shares = Shares 
                   = case when BucketNo = @BucketNo
                          then @Shares + Activity
                          else 0
                     end,
           @BucketNo = BucketNo
    from #t with (TABLOCKX) -- not strictly necessary when using a temp table.
    option (MAXDOP 1);      -- prevent parallelization of query
    
    select BucketNo, Active from #t order by BucketNo
    go