I need to use "FPAT" or the equivalent function "patsplit" of gawk. But it seems the installed version of gawk is 3.1.5 on our CentOs server.
I tried updating gawk with these commands:
yum update gawk;
And the server showed: "No Packages marked for Update"
I also tried reinstalling gawk with:
yum install gawk;
server output: "Package gawk-3.1.5-15.el5.x86_64 already installed and latest version "
Where i need gawk 4.0 or above to use those FPAT OR patsplit. And why i need to use them? well i am trying to process a CSV file, and it seems the CSV file has optional quotation and embedded comma.
Example:
From a csv row like this:
this,is,a,"csv,with,embedded coma"
I need to split the fields like this:
this
is
a
"csv,with,embedded comma"
And here is the gawk code:
awk '{patsplit("this,is,a,\"csv,with,embedded comma\"",a,"([^,]*)|(\"([^\"]|\"\")+\"[^,]*)",seps); for(i=0;i<length(a);i++) print a[i];}';
Can anyone help me on this please ?
Try using csvquote in your pipeline to make the data easy for awk to interpret. This is a script I wrote that replaces the commas inside quoted fields with nonprinting characters, and then restores them.
So if your awk command looked like this originally:
awk -F, '{print $3 "," $5}' inputfile.csv
... it can be made to work with csv quoted separators like this:
csvquote inputfile.csv | awk -F, '{print $3 "," $5}' | csvquote -u
For code and more documentation, see https://github.com/dbro/csvquote