Search code examples
memoryawkcygwin

awk keep on running on old file not new


 awk 'NR==1 {n=$2} {; file =sprintf("chr1_50kb_%.5d", ($2-n)/50000); if (file !=last_file) {; close(last_file); last_file = file;}; print > file;}' file2

I am running cygwin on windows 7. I've run this oneliner on file 1 and attempted to run it for file2 but the results keeps on giving me results for file 1 not file2. file 1 and file 2 are in separate folders and they are about 500k lines long.

How do I go about fixing this?

first lines for file1

chr19 3000118 + 0 0 0 0
chr19 3000119 - 0 0 0 0
chr19 3000315 + 0 0 0 0
chr19 3000316 - 0 0 0 0
chr19 3000602 + 0 0 0 0
chr19 3000603 - 0 0 0 0
chr19 3000718 + 0 0 0 0
chr19 3000719 - 0 0 0 0
chr19 3000720 + 0 0 0 0
chr19 3000721 - 0 0 0 0

first lines for file2

chr1 3000573 + 0 0 1 0 1 0
chr1 3000574 - 0 0 0 0 0 0
chr1 3000725 + 1 0 1 0 2 0
chr1 3000726 - 0 0 0 0 0 0
chr1 3000900 + 1 1 0 1 1 2
chr1 3000901 - 0 0 0 0 0 0
chr1 3001345 + 1 0 1 0 2 0
chr1 3001346 - 1 0 0 0 1 0
chr1 3001393 + 0 0 0 0 0 0
chr1 3001394 - 2 0 1 0 3 0

It seems like it is a result of the overlapping for $2 in file1 and 2, since the last files created ( where there are not overlapping number of $2 )has the results I am looking for .


Solution

  • Per the reference manual, the ">" print redirect will erase the previous contents of the output file. Note that both your file1 and file2 will try to redirect to the same output filename ($2 is in the same 50000 number block). To avoid losing any of your input in that way, I'd suggest changing that ">" to ">>" instead, which will append to any existing file. You'll probably need to delete (or empty) output files between runs in that case.

    Looking again, perhaps you should be coding $1 into the filename as well? Unless you want those chr19 records going into a file named chr1_... .