I have a csv file where some of the addresses have a comma in the middle, because of this I can't use
$ awk -F',' 'length($3) >= 10 {print $3}' schools.csv
an example of my data looks like this
id,name,address
"1","paul","103 avenue"
"2","shawn","108 BLVD, SE"
"3","ryan","MLK drive 1004"
as you can see the address for id two has a comma in between so I have to use gawk module 4. So far I've been able to print every row regardless if there is a comma or not but I only want to print the 3rd column(address) that has a field > 10 characters. Here is what I have thus far.
//awk.awk file
BEGIN {
FPAT = "([^,]+)|(\"[^\"]+\")"
}
{
print "NF = ", NF
for (i = 1; i <= NF; i++) {
printf("$%d = <%s>\n", i, $i)
}
}
$ gawk -f awk.awk schools.csv
Desire output would just be
108 BLVD, SE or "108 BLVD, SE"
Well, as you are already using GNU awk, you could utilize gensub
to remove leading and trailing double quotes for length
:
$ gawk 'BEGIN {
FPAT = "([^,]*)|(\"[^\"]+\")"
}
length(gensub(/^\"|\"$/,"","g",$3))>=10 {
print $3
}' file
Output:
"103 avenue"
"108 BLVD, SE"
"MLK drive 1004"
If you want the output without the double quotes as well:
{
gsub(/^"|"$/,"",$3)
if(length($3)>=10)
print $3
}