Have a csv file where line have 5 fields separated by comma:
2020-07-31 15:15:55,xx,yy,zz,t
2020-07-31 15:16:57,xx,yy,zz,t
2020-07-31 15:17:00,xx,yy,zz,t
and I would like to use the first line as 0 in seconds (relative time), so the output is like this:
0,xx,yy,zz,t
62,xx,yy,zz,t
65,xx,yy,zz,t
I can be done with any programming language, like bash, awk, sed, perl... and overwrite the same file or create a new one.
Original answer by @anubhava (there were only 2 fields):
You may use this awk
with mktime
function.
This will output the 2 fields separated by commas:
awk 'BEGIN {
FS=OFS="," # set input and output field separators to comma
}
{
gsub(/[-:]/, " ", $1) # replace - and : with a space
tm = mktime($1) # convert date-time string to EPOCH value
if (NR == 1) # for 1st records store this value in first
first = tm
print (tm - first), $2 # print difference and 2nd field for each record
}' file
Result:
0,xx
62,xx
65,xx
This will detect all comma separated fields using the NF variable, so will work even if there's only the first field. That way lines of different number of fields can be mixed:
get.awk:
BEGIN {
FS=OFS="," # set input and output field separators to comma
}
{
gsub(/[-:]/, " ", $1) # replace - and : with a space
tm = mktime($1) # convert date-time string to EPOCH value
if (NR == 1) # for 1st records store this value in first
first = tm
{
printf (tm - first) # print difference
if (NF > 1) # print 2nd to last fields only if they exist
{
for (i = 2; i < NF; i += 1)
printf ","$i
print ","$NF
}
else
print "" # otherwise print newline after printf
}
}
file.csv:
2020-07-31 15:15:55,xx,yy,zz,t
2020-07-31 15:16:57,xx,yy,zz,t
2020-07-31 15:17:00,xx,yy,zz,t
2020-07-31 15:17:23,xx,yy,zz,abc,009-%5
2020-07-31 15:18:00
2020-07-31 15:19:00,xx
Result:
awk -f get.awk file.csv
0,xx,yy,zz,t
62,xx,yy,zz,t
65,xx,yy,zz,t
88,xx,yy,zz,abc,009-%5
125
185,xx