Search code examples
datetimegoogle-bigquerydst

Get number of milliseconds for a localised date, taking into account daylight savings


I have data in Google BigQuery that looks like this:


sample_date_time_UTC     time_zone       milliseconds_between_samples
--------                 ---------       ----------------------------
2019-03-31 01:06:03 UTC  Europe/Paris    60000
2019-03-31 01:16:03 UTC  Europe/Paris    60000
...

Data samples are expected at regular intervals, indicated by the value of the milliseconds_between_samples field:

The time_zone is a string that represents a Google Cloud Supported Timezone Value


I'm then checking the ratio of the actual number of samples compared to the expected number over any particular day, for any single day range (expressed as a local date, for the given time_zone):

with data as 
  ( 
    select 
      -- convert sample_date_time_UTC to equivalent local datetime for the timezone
      DATETIME(sample_date_time_UTC,time_zone) as localised_sample_date_time, 
      milliseconds_between_samples 
    from  `mytable` 
    where sample_date_time between '2019-03-31 00:00:00.000000+01:00' and '2019-04-01 00:00:00.000000+02:00'
  ) 

select date(localised_sample_date_time) as localised_date, count(*)/(86400000/avg(milliseconds_between_samples)) as ratio_of_daily_sample_count_to_expected 
from data 
group by localised_date 
order by localised_date 

The problem is that this has a bug, as I've hardcoded the expected number of milliseconds in a day to 86400000. This is incorrect, as when daylight saving begins in the specified time_zone (Europe/Paris), a day is 1hr shorter. When daylight saving ends, the day is 1hr longer.

So, the query above is incorrect. It queries data for 31st March of this year in the Europe/Paris timezone (which is when daylight saving started in that timezone). The milliseconds in that day should be 82800000.

Within the query, how can I get the correct number of milliseconds for the specified localised_date?

Update:

I tried doing this to see what it returns:

select DATETIME_DIFF(DATETIME('2019-04-01 00:00:00.000000+02:00', 'Europe/Paris'), DATETIME('2019-03-31 00:00:00.000000+01:00', 'Europe/Paris'), MILLISECOND)

That didn't work - I get 86400000


Solution

  • Thanks @Juta, for the hint on using UTC times for the calculation. As I'm grouping my data for each day by a localised date, I figured out that I can work out milliseconds for each day by getting the beginning and end datetime (in UTC), for my 'localised' date, using the following logic:

    -- get UTC start datetime for localised date
    -- get UTC end datetime for localised date
    
    -- this then gives the milliseconds for that localised date:
    datetime_diff(utc_end_datetime, utc_start_datetime, MILLISECOND);
    

    So, my full query becomes:

    with daily_sample_count as (
      with data as 
        ( 
          select 
            -- get the date in the local timezone, for sample_date_time_UTC
            DATE(sample_date_time_UTC,time_zone) as localised_date, 
            milliseconds_between_samples 
          from  `mytable` 
          where sample_date_time between '2019-03-31 00:00:00.000000+01:00' and '2019-04-01 00:00:00.000000+02:00'
        ) 
    
      select
        localised_date,
        count(*) as daily_record_count,
        avg(milliseconds_between_samples) as daily_avg_millis_between_samples,
        datetime(timestamp(localised_date, time_zone)) as utc_start_datetime,
        datetime(timestamp(date_add(localised_date, interval 1 day), time_zone)) as utc_end_datetime
      from data 
    )
    
    select
      localised_date,
      -- apply calculation for ratio_of_daily_sample_count_to_expected
      -- based on the actual vs expected number of samples for the day
      -- no. of milliseconds in the day changes, when transitioning in/out of daylight saving - so we calculate milliseconds in the day
      daily_record_count/(datetime_diff(utc_end_datetime, utc_start_datetime, MILLISECOND)/daily_avg_millis_between_samples) as ratio_of_daily_sample_count_to_expected
    from
      daily_sample_count