Search code examples
clickhouse

Clickhouse generating array of dates between two dates and interval


I've been asking a lot of questions using Clickhouse lately I hope someone can save me from this suffering.

I have multiple dates in my DB,

ex. my db has dates of everyday for month of may (2020-05-01 ~ 2020-05-31)

then I want to set 1st may and 31st may as starting date and end date

and get the SELECT result of dates in array with certain interval

for ex.

SELECT (some query about setting 2 timestamps as start / end and interval of 5 days)

then the expecting result will be

2020-05-01
2020-05-05
2020-05-10
2020-05-15 . . .goes on till 30

and I want this interval to be month, day, hour, minute, second, milisecond

but as far as I've studied about clickhouse, is it true that we cant use milisecond with clickhouse ????

if so, do I have to convert the date into UInt64 then do some interval tricks to get the result UInt64 and then cast it to datetime ???

please help me :(

example of input data:

start_at
2020-01-14 18:04:36.000
2020-01-14 19:07:48.000
2020-01-14 20:46:48.000
2020-01-14 23:21:12.000
2020-01-15 00:02:00.000
2020-01-15 03:36:00.000
2020-01-15 04:54:24.000
2020-01-15 08:04:00.000
2020-01-15 09:04:00.000
2020-01-15 10:04:00.000
2020-01-15 11:04:00.000
2020-01-15 14:04:00.000
2020-01-15 18:04:00.000
2020-01-16 11:04:00.000
2020-01-16 17:04:00.000
2020-01-16 17:04:00.000
2020-01-17 11:04:00.000
2020-01-17 18:04:00.000
2020-01-17 20:04:00.000
2020-01-18 01:04:00.000
2020-01-18 15:04:00.000

expected result (with interval of 2 days for example)

    time                        count
2020-01-14 18:04:36.000
2020-01-16 18:04:36.000
2020-01-18 18:04:36.000 

or with interval 1 day

         time                       count
2020-01-14 18:04:36.000
2020-01-15 18:04:36.000
2020-01-16 18:04:36.000
2020-01-17 18:04:36.000
2020-01-18 18:04:36.000

or with 12 hours

             time                       count
2020-01-14 18:04:36.000
2020-01-15 06:04:36.000
2020-01-15 18:04:36.000
2020-01-16 06:04:36.000
2020-01-16 18:04:36.000
2020-01-17 06:04:36.000
2020-01-17 18:04:36.000
2020-01-18 06:04:36.000
2020-01-18 18:04:36.000

Solution

  • Try this query:

    WITH
        toDateTime64('2020-01-14 18:04:36.000', 3) AS start_from,
        toUnixTimestamp64Milli(start_from) AS start_from_ts,
        ((12 * 60) * 60) * 1000 AS interval_msec
    SELECT
      fromUnixTimestamp64Milli(toInt64(ts)) dt,
      count
    FROM (    
      SELECT
          start_from_ts + interval_msec * interval_number AS ts,
          floor((toUnixTimestamp64Milli(start_at) - start_from_ts) / interval_msec) AS interval_number,
          count() AS count
      FROM 
      (
        /* emulate the test dataset */
        SELECT toDateTime64(dt, 3) AS start_at
          FROM (
            SELECT arrayJoin([
              ('2020-01-14 18:04:36.000'),
              ('2020-01-14 19:07:48.000'),
              ('2020-01-14 20:46:48.000'),
              ('2020-01-14 23:21:12.000'),
              ('2020-01-15 00:02:00.000'),
              ('2020-01-15 03:36:00.000'),
              ('2020-01-15 04:54:24.000'),
              ('2020-01-15 08:04:00.000'),
              ('2020-01-15 09:04:00.000'),
              ('2020-01-15 10:04:00.000'),
              ('2020-01-15 11:04:00.000'),
              ('2020-01-15 14:04:00.000'),
              ('2020-01-15 18:04:00.000'),
              ('2020-01-16 11:04:00.000'),
              ('2020-01-16 17:04:00.000'),
              ('2020-01-16 17:04:00.000'),
              ('2020-01-17 11:04:00.000'),
              ('2020-01-17 18:04:00.000'),
              ('2020-01-17 20:04:00.000'),
              ('2020-01-18 01:04:00.000'),
              ('2020-01-18 15:04:00.000')]) dt)
        )
      WHERE start_at >= start_from
      GROUP BY interval_number
      ORDER BY ts WITH FILL FROM toUnixTimestamp64Milli(toDateTime64('2020-01-14 18:04:36.000', 3)) TO toUnixTimestamp64Milli(toDateTime64('2020-01-18 18:04:36.000', 3)) STEP ((12 * 60) * 60) * 1000
      )
    
    /* result
    ┌──────────────────────dt─┬─count─┐
    │ 2020-01-14 18:04:36.000 │     7 │
    │ 2020-01-15 06:04:36.000 │     6 │
    │ 2020-01-15 18:04:36.000 │     0 │
    │ 2020-01-16 06:04:36.000 │     3 │
    │ 2020-01-16 18:04:36.000 │     0 │
    │ 2020-01-17 06:04:36.000 │     2 │
    │ 2020-01-17 18:04:36.000 │     2 │
    │ 2020-01-18 06:04:36.000 │     1 │
    └─────────────────────────┴───────┘
    */
    

    The query above calculates values for the interval 12 hours.

    To apply it to other intervals needs to modify: