I'm fairly new to awk and I'm writing a script to read contents of a file process it and then append the result to few files based on the result. The script works on file containing about 100 lines but fails for a file containing 125k lines. I'm confused if its the issue with the way i'm doing things here because i've seen awk work fine with larger files.
Here's my code: FileSplitting.awk
BEGIN { print "Splitting file ";} { print NR; r=int($2/1024); if(r>5){ print $0 >> "testFile";} if(r<=5){ print $0 >> "testFile2";} } END { print "Done"; }
I'm invoking the script like this:
awk -F"," -f FileSplitting.awk test.csv
The issue is you're using the wrong output redirection operator. You should be using >
not >>
. Awk does not behave the same as shell wrt these 2 operators. man awk for how those operators work in awk and change your script to:
BEGIN { print "Splitting file ";} { print NR; r=int($2/1024); if(r>5){ print $0 > "testFile";} if(r<=5){ print $0 > "testFile2";} } END { print "Done"; }
to get it to work, and then clean it up to:
BEGIN { print "Splitting file " }
{ print NR; print > ("testFile" (int($2/1024)>5?"":"2")) }
END { print "Done" }
You do NOT need to close the files after every write.
In response to @Aryan's comment below, here are the >
and >>
awk vs shell equivalents:
1) awks >
awk:
{ print > "foo" }
shell equivalent:
> foo
while IFS= read -r var
do
printf "%s\n" "$var" >> foo
done
2) awks >>
awk:
{ print >> "foo" }
shell equivalent:
while IFS= read -r var
do
printf "%s\n" "$var" >> foo
done