Search code examples
sqloraclehistogramfrequency-distribution

Optimal way to create a histogram/frequency distribution in Oracle?


I have an events table with two columns eventkey (unique, primary-key) and createtime, which stores the creation time of the event as the number of milliseconds since Jan 1 1970 in a NUMBER column.

I would like to create a "histogram" or frequency distribution that shows me how many events were created in each hour of the past week.

Is this the best way to write such a query in Oracle, using the width_bucket() function? Is it possible to derive the number of rows that fall into each bucket using one of the other Oracle analytic functions rather than using width_bucket to determine what bucket number each row belongs to and doing a count(*) over that?

-- 1305504000000 = 5/16/2011 12:00am GMT
-- 1306108800000 = 5/23/2011 12:00am GMT
select 
timestamp '1970-01-01 00:00:00' + numtodsinterval((1305504000000/1000 + (bucket * 60 * 60)), 'second') period_start,
numevents
from (
  select bucket, count(*) as events from (
    select eventkey, createtime, 
    width_bucket(createtime, 1305504000000, 1306108800000, 24 * 7) bucket
    from events 
    where createtime between 1305504000000 and 1306108800000
  ) group by bucket
) 
order by period_start

Solution

  • If your createtime were a date column, this would be trivial:

    SELECT TO_CHAR(CREATE_TIME, 'DAY:HH24'), COUNT(*) 
      FROM EVENTS
     GROUP BY TO_CHAR(CREATE_TIME, 'DAY:HH24');
    

    As it is, casting the createtime column isn't too hard:

    select TO_CHAR( 
             TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000), 
             'DAY:HH24') AS BUCKET, COUNT(*)
       FROM EVENTS
      WHERE createtime between 1305504000000 and 1306108800000
     group by TO_CHAR( 
             TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000), 
             'DAY:HH24') 
     order by 1
    

    If, alternatively, you're looking for the fencepost values (for example, where do I go from the first decile (0-10%) to the next (11-20%), you'd do something like:

    select min(createtime) over (partition by decile) as decile_start,
           max(createtime) over (partition by decile) as decile_end,
           decile
      from (select createtime, 
                   ntile (10) over (order by createtime asc) as decile
              from events
             where createtime between 1305504000000 and 1306108800000
           )