Search code examples
bashawkwhile-loop

Trying to run an awk command on each line in a file, and replace the current line with the results of the awk


I am trying to cycle through a file, and use each line to search a different csv file (specifically searching the 2nd column) and return the 4th column in that file for each entry. I then want to replace the current line with the new result. The input file looks like this:

Ant
Bat
Carp
Dog

At the moment I am using this code:

while read -r line
do
awk -v line="\"$line\"" -F, '$2 ~ line' search.csv | awk -F, '{print $4}' >> $filename
done < $filename

This adds the new line onto the end of the file and so I get this output:

Ant
Bat
Carp
Dog
"Insect"
"Mammal"
"Fish"
"Mammal"

How do I get just the second list (the words in quotes)?


Edited:

Here is sample data from search.csv:

"A0001","Dog","Canine","Mammal","4","Y","N"

Update#1

The search.csv file has each entry in quotes. I have edited my awk like this:

awk -F, 'NR==FNR {wrds["\""$1"\""]; next} $2 in wrds {print ($4 > "temporary.txt")}' "filename.txt" search.csv

And now it prints a 0 for each line to the screen instead. The temporary.txt file is still empty:

0
0
0
0

Solution

  • Looking at OP's latest code attempt:

    awk -F, 'NR==FNR {wrds["\""$1"\""]; next} $2 in wrds {print ($4 > "temporary.txt")}' "filename.txt" search.csv
    

    OP states this generates a 0 at the terminal and temporary.txt is empty. NOTE: When I run OP's code it does generate a 0 at the console but it doesn't even create a file named temporary.txt.

    The primary issue is with the following:

    print ($4 > "temporary.txt")
    

    Where:

    • contents of the parens are processed first, so ...
    • ($4 > "temporary.txt") is processed as a comparison/conditional, ie, comparing the 4th field to see if it's 'greater than' the literal string "temporary.txt"
    • in this case $4 == "Mammal" and awk says "Mammal" is not > "temporary.txt" so the result is 'false', which in awk is represented by a 0, so ...
    • the print sends a 0 (aka 'false') to the console ...
    • and, of course, nothing is ever written to any file (ie, the file named temporary.txt is not created let alone written to)

    The quick fix is to remove the parens so that the print sends $4 to a file named temporary.txt, ie:

    $ awk -F, 'NR==FNR {wrds["\""$1"\""]; next} $2 in wrds {print $4 > "temporary.txt"}' "filename.txt" search.csv
    $              <<=== no ouput, no '0'
    $ cat temporary.txt
    "Mammal
    

    The method of designating the name of the output file, from within the awk script, is typically used when you want the awk script to send output to a dynamically generated set, or variable number, of output files.

    In this case since all output is going to the same temporary.txt file, the typical approach is to define the output file on the command line (ie, outside of the awk script); this tends to make the awk script a bit cleaner, eg:

    awk -F, 'NR==FNR {wrds["\""$1"\""]; next} $2 in wrds {print $4}' filename.txt search.csv > temporary.txt
                                                          ^^^^^^^^                             ^^^^^^^^^^^^^