I am trying to look through a set of files. There are 4-5 files for each month in a 2 year period with 1000+ stations in them. I am trying to separate them so that I have one file per station_no (station_no = $1).
I thought this was easy and simply went with;
awk -F, '{ print > $1".txt" }' *.csv
which I've tested with one file and it works fine. However, when I run this it creates the .txt files, but there is nothing in the files.
I've now tried to put it in a loop and see if that works;
#!/bin/bash
#program to extract stations from orig files
for file in $(ls *.csv)
do
awk -F, '{print > $1".txt" }' $file
done
It works as it loops through the files etc, but it keeps overwriting the when it moves to the next month.
How do I stop it overwriting and just adding to the end of the .txt with that name?
You are saying print > file
, which truncates on every new call. Use >>
instead, so that it appends to the previous content.
Also, there is no need to loop through all the files and then call awk
for each one. Instead, provide the set of files to awk
like this:
awk -F, '{print >> ($1".txt")}' *.csv
Note, however, that we need to talk a little about how awk
keeps files opened for writing. If you say awk '{print > "hello.txt"}' file
, awk will keep hello.txt
file opened until it finishes processing. In your current approach, awk
stops on every file; however, in my current suggested approach the file is open until the last file is processed. Thus, in this case a single >
suffices:
awk -F, '{print > $1".txt"}' *.csv
For the detail on ( file )
, see below comments by Ed Morton, I cannot explain it better than him :)