Search code examples
bashawkcut

Pass variable to awk or store only the first field


I have a data file that I need to filter based on the value of the first field (row 0 column 0). For example, with this data:

123 test1
123 test2
321 test3
321 test4
451 test5

I need to generate this output:

123 test1
123 test2

So I need some way to store only the first field and match against it in awk. The problem is awk code is run for every line so that variable is always overwritten. Is the solution to cut the first field then store that in a variable and match against that in awk? If so, can you provide an example of that?

The problem with this code is it doesn't print the first match, and it will update field so that it will print other undesirable matches.

awk -F"  " '
$1 == field {
        print; 
}
$1 != field {
        field = $1
}
' data.txt > awkOutput.txt

Solution

  • awk's default field separator is sequence of space so you don't need to set -F" ". Since you are only interested in the first field of first line, use NR variable which holds the line number.

    The following awk one-liner does what you need:

    $ awk 'NR==1{ field = $1 }$1==field' file
    123 test1
    123 test2
    

    NR==1 is an pattern that gets executed for the first line only. We set variable field to $1. The next pattern checks if first column is equal to our variable. If it matches, it returns a truth value. In awk the truth value triggers default print of the line.