Search code examples
bashdateutcdate-formatting

Bash: convert local time to UTC during the DST end


Probably asked (many times) before but cannot find anything helpful.

I have a csv file with timeseries. Dates are local without indicating timezone or DST. I use bash and date utility to convert all dates in the file to UTC. However, during the DST end in October there are two identical timestamps with different meaning. How to handle this, e.g. is there any argument to date command to indicate that the original local date is before/after DST end?

Here is the command I use to convert (in the loop, date is stored in variable):

date -u -Iseconds --date='TZ="Europe/Rome" 2021-10-31T01:55:00'
2021-10-31T01:55:00,values  # converts to 2021-10-30T23:55:00+00:00, OK
2021-10-31T01:56:00,values  # converts to 2021-10-30T23:56:00+00:00, OK
2021-10-31T01:57:00,values  # converts to 2021-10-30T23:57:00+00:00, OK
2021-10-31T01:58:00,values  # converts to 2021-10-30T23:58:00+00:00, OK
2021-10-31T01:59:00,values  # converts to 2021-10-30T23:59:00+00:00, OK
2021-10-31T02:00:00,values  # converts to 2021-10-31T01:00:00+00:00, wrong
2021-10-31T02:01:00,values  # converts to 2021-10-31T01:01:00+00:00, wrong
2021-10-31T02:02:00,values  # converts to 2021-10-31T01:02:00+00:00, wrong
2021-10-31T02:03:00,values  # converts to 2021-10-31T01:03:00+00:00, wrong
2021-10-31T02:04:00,values  # converts to 2021-10-31T01:04:00+00:00, wrong
2021-10-31T02:05:00,values  # converts to 2021-10-31T01:05:00+00:00, wrong
...
2021-10-31T02:55:00,values  # converts to 2021-10-31T01:55:00+00:00, wrong
2021-10-31T02:56:00,values  # converts to 2021-10-31T01:56:00+00:00, wrong
2021-10-31T02:57:00,values  # converts to 2021-10-31T01:57:00+00:00, wrong
2021-10-31T02:58:00,values  # converts to 2021-10-31T01:58:00+00:00, wrong
2021-10-31T02:59:00,values  # converts to 2021-10-31T01:59:00+00:00, wrong
2021-10-31T02:00:00,values  # converts to 2021-10-31T01:00:00+00:00, OK
2021-10-31T02:01:00,values  # converts to 2021-10-31T01:01:00+00:00, OK
2021-10-31T02:02:00,values  # converts to 2021-10-31T01:02:00+00:00, OK
2021-10-31T02:03:00,values  # converts to 2021-10-31T01:03:00+00:00, OK
2021-10-31T02:04:00,values  # converts to 2021-10-31T01:04:00+00:00, OK
2021-10-31T02:05:00,values  # converts to 2021-10-31T01:05:00+00:00, OK
...

EDITED (as requested)

Input file:

2021-10-31T01:45:00,value1,value2
2021-10-31T02:00:00,value1,value2
2021-10-31T02:15:00,value1,value2
2021-10-31T02:30:00,value1,value2
2021-10-31T02:45:00,value1,value2
2021-10-31T02:00:00,value1,value2
2021-10-31T02:15:00,value1,value2
2021-10-31T02:30:00,value1,value2
2021-10-31T02:45:00,value1,value2
2021-10-31T03:00:00,value1,value2

Output file:

2021-10-30T23:45:00+00:00,value1,value2
2021-10-31T00:00:00+00:00,value1,value2
2021-10-31T00:15:00+00:00,value1,value2
2021-10-31T00:30:00+00:00,value1,value2
2021-10-31T00:45:00+00:00,value1,value2
2021-10-31T01:00:00+00:00,value1,value2
2021-10-31T01:15:00+00:00,value1,value2
2021-10-31T01:30:00+00:00,value1,value2
2021-10-31T01:45:00+00:00,value1,value2
2021-10-31T02:00:00+00:00,value1,value2

How the script will know if 2021-10-31T02:15:00 in input file is CET or CEST. Obviously by position in the file, first occurence is CEST and the second one is CET. Ha, this is easy to explain but complex to implement in bash, and all this because someone used local time without timezone in the csv file.


Solution

  • The question is:

    is there any argument to date command to indicate that the original local date is before/after DST end?

    Yes, you add the timezone. In this case: CET from Central European Time or CEST from Central Eastern European Time.

    $ date -u -Iseconds --date 'TZ="Europe/Rome" 2021-10-31T02:59:00 CET'
    2021-10-31T01:59:00+00:00
    $ date -u -Iseconds --date 'TZ="Europe/Rome" 2021-10-31T02:59:00 CEST'
    2021-10-31T00:59:00+00:00