Search code examples
fileawkexternalgawk

awk pattern from external txt file


Currently I have a folder with 4 files, one script.awk script file, the gawk.exe binary which runs the script and a report.csv file in which the script reading the lines from, and one names.txt file which contains usernames, one per line. Sometimes the names contain spaces, so occasionally they are actually two or more separate words, but there is strictly one username per line in the txt file. When I run my awk script, I store some of the data from the csv file in a variable called "name".

Now let's say name = "Pete", and I would like to check if whether the names.txt file contains the username Pete or not, it has to be exactly "Pete", not like "Pete Sampras" etc., and when a match was found, I would like to take further action obviously.

The txt file contains about 500 lines like these:

leopanato
colan321
kamon mdp
BELLAM42

Solution

  • Basic question

    Read the names into an array, and use that:

    gawk -f script.awk names.txt report.csv
    

    and script.awk might contain:

    FNR == NR { names[$0]++; next }
    {
        …code to determine name…
        if (name in names)
        {
           …actions for matched name…
        }
    }
    

    The FNR == NR line processes the first file, reading the names from that file into the array called, imaginatively, names. The next means that the rest of the code is not processed while the first file (names.txt) is read.

    One the code is reading the second file, FNR (file record number) no longer equals NR (overall record number), so the first line is skipped. The action processes the line from report.csv. You've not shown how you're handling the CSV material, which is fine — you say you load a name into name. The if statement checks to see whether the value in name is an index in the array names. If so, the appropriate actions are executed.

    Extended question

    You can look at the ARGV array and length(ARGV) and also at FILENAME to deduce what you are dealing with. Adapting the code:

    BEGIN { if (length(ARGV) != 4) { printf "Usage: %s good.txt bad.txt records.csv\n", ARGV[0]; exit(1) } }
    FILENAME == ARGV[1] { good[$0]++; next }
    FILENAME == ARGV[2] { bad[$0]++; next }
    {
        …code to determine name…
        if (name in good) { …actions for good names… }
        if (name in bad)  { …actions for bad  names… }
    }
    

    Note that this coding scheme allows the same name to be both good and bad at the same time. You could decide that people should be treated as good even if they're also listed as bad, or vice versa. You could even check that there are no duplicates between the good and bad lists if you wanted to.