Search code examples
awkgawknawk

awk failing to read contents of a large file


I'm fairly new to awk and I'm writing a script to read contents of a file process it and then append the result to few files based on the result. The script works on file containing about 100 lines but fails for a file containing 125k lines. I'm confused if its the issue with the way i'm doing things here because i've seen awk work fine with larger files.

Here's my code: FileSplitting.awk

BEGIN { print "Splitting file ";} { print NR; r=int($2/1024); if(r>5){ print $0 >> "testFile";} if(r<=5){ print $0 >> "testFile2";} } END { print "Done"; }

I'm invoking the script like this:

awk -F"," -f FileSplitting.awk test.csv

Solution

  • The issue is you're using the wrong output redirection operator. You should be using > not >>. Awk does not behave the same as shell wrt these 2 operators. man awk for how those operators work in awk and change your script to:

    BEGIN { print "Splitting file ";} { print NR; r=int($2/1024); if(r>5){ print $0 > "testFile";} if(r<=5){ print $0 > "testFile2";} } END { print "Done"; }
    

    to get it to work, and then clean it up to:

    BEGIN { print "Splitting file " }
    { print NR; print > ("testFile" (int($2/1024)>5?"":"2")) }
    END { print "Done" }
    

    You do NOT need to close the files after every write.

    In response to @Aryan's comment below, here are the > and >> awk vs shell equivalents:

    1) awks >

    awk:
        { print > "foo" }
    
    shell equivalent:
    
        > foo
        while IFS= read -r var
        do
            printf "%s\n" "$var" >> foo
        done
    

    2) awks >>

    awk:
        { print >> "foo" }
    
    shell equivalent:
    
        while IFS= read -r var
        do
            printf "%s\n" "$var" >> foo
        done