Search code examples
pipelinedb

Pipelinedb: How to group stream data by each N minutes in continuous view


How to group data from pipelinedb's stream by each N minutes in continuous view select?

Pipelinedb's stream gets data about the events that comes from a many remote hosts. I need to group this events by type, ip and time intervals in 5 minutes, for example, and count them.

So on input I have (very roughly):

time  | ip               | type      
------------------------------------
22:35 | 111.111.111.111  | page_open <-- new interaval, ends in 22:40
22:36 | 111.111.111.111  | page_open
22:37 | 111.111.111.111  | page_close 
22:42 | 111.111.111.111  | page_close <-- event comes in next interval, ends in 22:45
22:42 | 222.111.111.111  | page_open 
22:43 | 222.111.111.111  | page_open
22:44 | 222.111.111.111  | page_close 
22:44 | 111.111.111.111  | page_open

And what must be in continuous view select:

time  | ip               | type       | count
---------------------------------------------
22:40 | 111.111.111.111  | page_open  | 2
22:40 | 111.111.111.111  | page_close | 1
22:45 | 111.111.111.111  | page_open  | 1
22:45 | 111.111.111.111  | page_close | 1
22:45 | 222.111.111.111  | page_open  | 2
22:45 | 222.111.111.111  | page_close | 1

p.s. Sorry for my english


Solution

  • You can use the date_round(column, interval) [0] function for that. For example,

    CREATE CONTINUOUS VIEW bucketed AS
      SELECT date_round(time, '5 minutes') AS bucket, COUNT(*)
        FROM input_stream GROUP BY bucket;
    

    [0] http://docs.pipelinedb.com/builtin.html?highlight=date_round