Search code examples
bashawkoverwrite

awk overwriting files in a loop


I am trying to look through a set of files. There are 4-5 files for each month in a 2 year period with 1000+ stations in them. I am trying to separate them so that I have one file per station_no (station_no = $1).

I thought this was easy and simply went with;

awk -F, '{ print > $1".txt" }' *.csv

which I've tested with one file and it works fine. However, when I run this it creates the .txt files, but there is nothing in the files.

I've now tried to put it in a loop and see if that works;

#!/bin/bash
#program to extract stations from orig files

for file in $(ls *.csv)

do 
    awk -F, '{print > $1".txt" }' $file

done

It works as it loops through the files etc, but it keeps overwriting the when it moves to the next month.

How do I stop it overwriting and just adding to the end of the .txt with that name?


Solution

  • You are saying print > file, which truncates on every new call. Use >> instead, so that it appends to the previous content.

    Also, there is no need to loop through all the files and then call awk for each one. Instead, provide the set of files to awk like this:

    awk -F, '{print >> ($1".txt")}' *.csv
    

    Note, however, that we need to talk a little about how awk keeps files opened for writing. If you say awk '{print > "hello.txt"}' file, awk will keep hello.txt file opened until it finishes processing. In your current approach, awk stops on every file; however, in my current suggested approach the file is open until the last file is processed. Thus, in this case a single > suffices:

    awk -F, '{print > $1".txt"}' *.csv
    

    For the detail on ( file ), see below comments by Ed Morton, I cannot explain it better than him :)