I am trying to use awk
to remove trailing commas from in.csv
from each line. The awk
below does execute but prints all entries on one line.
in.csv
:
gene,transcript,cNomen,pNomen,Genomic Description,,,,,,
ASXL1,NM_015338.5,c.1773C>A,p.Y591*,chr20:31022288C>A,,,,,,
ASXL1,NM_015338.5,c.1954G>A,p.G652S,chr20:31022469G>A,,,,,,
Desired:
gene,transcript,cNomen,pNomen,Genomic Description
ASXL1,NM_015338.5,c.1773C>A,p.Y591*,chr20:31022288C>A
ASXL1,NM_015338.5,c.1954G>A,p.G652S,chr20:31022469G>A
Code:
awk -F',' '{ for (i=1; i<=NF; i++) {
if ($i != "") { printf("%s", $i) }
if (i<NF) printf(",")
}
printf("\n") }' in.csv
Why not just
awk '{ sub(/,+$/, "")} 1' in.csv
or equivalently
sed 's/,*$//' in.csv
The flaw with your attempt is that it would still print the comma even if a field was empty. Also, if you fixed that, it would obliterate empty fields from the middle of a line, not just from the end.
Here's a different approach which attempts to preserve the basic idea of your code:
awk -F , '{ for (i=NF; i>0; i--)
if ($i != "") { nf=i; break }
for(i=1; i<nf; ++i) printf "%s,", $i
print $nf }'
We loop backwards from the last field until we find a non-empty one, then print the fields up through that one, with a comma after each except the last, which is then printed with a newline instead after the second for
loop.
Demo: https://ideone.com/U5joBb
If you are confident that empty fields can only occur at the end, then maybe
awk -F , '{ sep="";
for (i=1; i<=NF; i++) {
if ($i == "") break;
printf "%s%s", sep, $i;
sep=FS }
print "" }'
Notice how we print the separator before the next field, and start with an empty separator; that way, we avoid printing a comma after the last non-empty field, whilst keeping the rest of the code uniform.
(The earlier script could use the same logic as well; I wanted to show some common variations.)