Search code examples
linuxbashcsvdelimiterfileparsing

CSV data multiple level parsing Shell script


Hi I have a CSV data in following format

ColumnHeader1,ColumnHeader2,ColumnHeader3
valcol1p1,name=testapp1 environment=dev coldata=My_Test_Logs @$ 192.168.1.1 @$ r1 @$ r2 @$ POST API ,valcol3p1
valcol1p1,name=testapp2 environment=qa coldata=My_Test_Logs @$ 192.168.1.1 @$ r1 @$ r2 @$ GET API ,valcol3p1 

img sample data I

I need to extract the data in ColumnHeader2 column after My_Test_Logs and parse the data after the delimtter '@$'.So for each csv line I would get 4 values. I need to concatenate them with the same delimiter value '@$' and place in CSV.

The output will be something like this

Output

Now i have solved it in parts.

Like to get the ColumnHeader2 column data

awk -F "\"*,\"*" '{print $2}' Mytest.csv

or to take only first x fields using multiple chars delimiter:

awk -F"[@][$]" '{print $1,$2,$3,$4}' Mytest1.csv
where MyTest1 contains the output of extracted Columnheader2 data

But together the whole logic of extracting and then concatenating is giving some issues .Can someone please help here.I need a single script to work on my CSV and write the results in another csv rather using multiple csv or text outputs in between?


Solution

  • This should met both your requirements:

    awk -F',| *@[$] *' -v OFS='@$' -e 'NR==1 {print "outCol1","outCol2","outcol3"}; NR > 1 { print $3,$4,$5}' sample.txt
    
    outCol1@$outCol2@$outcol3
    192.168.1.1@$r1@$r2
    192.168.1.1@$r1@$r2
    
    • -F',|@[$]' - field separator is a regular expression in awk, this will match both,and@$, and also trim the spaces around@$`
    • -v OFS='@$' - the default field separator for output is space, this will set it to '@$'
    • NR == 1 {print "outCol1","outCol2","outcol3"} - for first line print the new header
    • NR > 1 { print $3,$4,$5} - for the rest of the lines print fields $3,$4,$5

    Note: this assumes that there are no , or @$ escaped in the rest of the CSV, otherwise you should use a proper CSV parser.