Search code examples
bashshellcsvtimetext-processing

Convert YYYY-MM-DD HH:MM:SS to seconds relative to 0


Have a csv file where line have 5 fields separated by comma:

2020-07-31 15:15:55,xx,yy,zz,t
2020-07-31 15:16:57,xx,yy,zz,t
2020-07-31 15:17:00,xx,yy,zz,t

and I would like to use the first line as 0 in seconds (relative time), so the output is like this:

0,xx,yy,zz,t
62,xx,yy,zz,t
65,xx,yy,zz,t

I can be done with any programming language, like bash, awk, sed, perl... and overwrite the same file or create a new one.


Solution

  • Original answer by @anubhava (there were only 2 fields):

    You may use this awk with mktime function. This will output the 2 fields separated by commas:

    awk 'BEGIN {
       FS=OFS=","                 # set input and output field separators to comma
    }
    {
       gsub(/[-:]/, " ", $1)      # replace - and : with a space
       tm = mktime($1)            # convert date-time string to EPOCH value
       if (NR == 1)               # for 1st records store this value in first
          first = tm
       print (tm - first), $2     # print difference and 2nd field for each record
    }' file
    

    Result:

    0,xx
    62,xx
    65,xx
    

    Improved by @smeterlink

    This will detect all comma separated fields using the NF variable, so will work even if there's only the first field. That way lines of different number of fields can be mixed:

    get.awk:

    BEGIN {
       FS=OFS=","                 # set input and output field separators to comma
    }
    {
       gsub(/[-:]/, " ", $1)      # replace - and : with a space
       tm = mktime($1)            # convert date-time string to EPOCH value
       if (NR == 1)               # for 1st records store this value in first
          first = tm
       {
          printf (tm - first)     # print difference
          if (NF > 1)             # print 2nd to last fields only if they exist
          {
             for (i = 2; i < NF; i += 1)
             printf ","$i
             print ","$NF
          }
          else
             print ""             # otherwise print newline after printf
       }
    }
    

    file.csv:

    2020-07-31 15:15:55,xx,yy,zz,t
    2020-07-31 15:16:57,xx,yy,zz,t
    2020-07-31 15:17:00,xx,yy,zz,t
    2020-07-31 15:17:23,xx,yy,zz,abc,009-%5
    2020-07-31 15:18:00
    2020-07-31 15:19:00,xx
    

    Result:

    awk -f get.awk file.csv
    0,xx,yy,zz,t
    62,xx,yy,zz,t
    65,xx,yy,zz,t
    88,xx,yy,zz,abc,009-%5
    125
    185,xx