Search code examples
bashshelldebiancommand-line-interfacegnu

How to extract columns from CSV file, process and create a CSV file based on the extract and processing result?


The is a snipped from the source CSV file.

%status,date,job,project,start,end,description
%
//,18.03.2021,sib,sib-dede,07:00,15:00,dede-mongo
%
//,11.06.2021,sib,sib-dede,07:00,15:00,dede-mongo
%
//,24.06.2021,sib,sib-dede,07:00,15:00,dede-mongo
%
?,02.08.2021,sib,sib-accounting,14:35,16:35,business-plan
%
?,13.10.2021,sb,sb-accounting,11:30,12:00,e-mail-pump

I like to extract from the source CSV file the start time in column 5 and the end time in column 6.

Next, based on start and end time I like to process the duration in hours or minutes (don't mind).

Eventually, I like to grab the source CSV file, insert a new column between the existing column 6 and 7 with the processed duration and save this addition in a result CSV file.

Done anyone has an idea how to face this challenge on a GNU Debian Linux command line?

I am aware that I can cut specific columns from a CSV file like this.

cut -d, -f5,6 < ./source.csv > ./result.csv

However, I am still missing the duration processing and the result CVS file creation.

PS: I have a preference for Bash.

PPS: There are related questions like this but I found none close to this question.


Solution

  • I have a preference for Bash

    But you could do all of it in one go, e.g. using mktime from GNU Awk (assuming the % signs (or the subsequent newlines) in your sample are just a typo, and that the time in column 6 is higher than the one in column 5, otherwise the result is negative):

    awk -F , -v OFS=, '{
      $8=$7; $7=(
        mktime(sprintf("0 0 0 %d %d 0", substr($6,1,2), substr($6,4,2))) - 
        mktime(sprintf("0 0 0 %d %d 0", substr($5,1,2), substr($5,4,2)))
      ) / 60; print
    }' source.csv