Search code examples
sqlgoogle-bigquerygoogle-query-language

Is there a way to group timestamp data by 30 day intervals starting from the min(date) and add them as columns


I am trying to use the min() value of a timestamp as a starting point and then group data by 30 day intervals in order to get a count of occurrences for each unique value within the timestamp date range as columns

i have two tables that i am joining together to get a count. Table 1 (page_creation) has 2 columns labeled link and dt_crtd. Table 2(page visits) has 2 other columns labeled url and date. the tables are being joined by joining table1.link = table2.pagevisits.

After the join i get a table similar to this:

+-------------------+------------------------+
| url               |     date               |
+-------------------+------------------------+
| www.google.com    | 2018-01-01 00:00:00'   |
| www.google.com    | 2018-01-02 00:00:00'   |
| www.google.com    | 2018-02-01 00:00:00'   |
| www.google.com    | 2018-02-05 00:00:00'   |
| www.google.com    | 2018-03-04 00:00:00'   |
| www.facebook.com  | 2014-01-05 00:00:00'   |
| www.facebook.com  | 2014-01-07 00:00:00'   |
| www.facebook.com  | 2014-04-02 00:00:00'   |
| www.facebook.com  | 2014-04-10 00:00:00'   |
| www.facebook.com  | 2014-04-11 00:00:00'   |
| www.facebook.com  | 2014-05-01 00:00:00'   |
| www.twitter.com   | 2016-02-01 00:00:00'   |
| www.twitter.com   | 2016-03-04 00:00:00'   |
+---------------------+----------------------+

what i am trying to get is results that pull this :

+-------------------+------------------------+------------+------------+-------------+
| url               | MIN_Date               | Interval 1  | Interval 2|  Interval 3 |
+-------------------+------------------------+-------------+-----------+-------------+
| www.google.com    | 2018-01-01 00:00:00'   |  2          |  2        |  1      
| www.facebook.com  | 2014-01-05 00:00:00'   |  2          |  0        |  1
| www.twitter.com   | 2016-02-01 00:00:00'   |  1          |  1        |  0    
+---------------------+----------------------+-------------+-----------+-------------+

So the 30 day intervals begin from the min(date) as shown in Interval 1 and are counted every 30 days.

Ive looked at other questions such as :

Group rows by 7 days interval starting from a certain date

MySQL query to select min datetime grouped by 30 day intervals

However it did not seem to answer my specific problem.

Ive also looked into pivot syntax but noticed it is only supported for certain DBMS.

Any help would be greatly appreciated.

Thank you.


Solution

  • If you are using BigQuery, I would recommend:

    • countif() to count a boolean value
    • timestamp_add() to add intervals to timestamps

    The exact boundaries are a bit vague, but I would go for:

    select pc.url,
           countif(pv.date >= pc.dt_crtd and
                   pv.date < timestamp_add(pc.dt_crtd, interval 30 day
                  ) as Interval_00_29,    
           countif(pv.date >= timestamp_add(pc.dt_crtd, interval 30 day) and
                   pv.date < timestamp_add(pc.dt_crtd, interval 60 day
                  ) as Interval_30_59,    
           countif(pv.date >= timestamp_add(pc.dt_crtd, interval 60 day) and
                   pv.date < timestamp_add(pc.dt_crtd, interval 90 day
                  ) as Interval_60_89
    from page_creation pc join
         page_visits pv
         on pc.link = pv.url
    group by pc.url