I have a data file that I need to filter based on the value of the first field (row 0 column 0). For example, with this data:
123 test1
123 test2
321 test3
321 test4
451 test5
I need to generate this output:
123 test1
123 test2
So I need some way to store only the first field and match against it in awk
. The problem is awk
code is run for every line so that variable is always overwritten. Is the solution to cut the first field then store that in a variable and match against that in awk
? If so, can you provide an example of that?
The problem with this code is it doesn't print the first match, and it will update field
so that it will print other undesirable matches.
awk -F" " '
$1 == field {
print;
}
$1 != field {
field = $1
}
' data.txt > awkOutput.txt
awk
's default field separator is sequence of space so you don't need to set -F" "
. Since you are only interested in the first field of first line, use NR
variable which holds the line number.
The following awk
one-liner does what you need:
$ awk 'NR==1{ field = $1 }$1==field' file
123 test1
123 test2
NR==1
is an pattern that gets executed for the first line only. We set variable field
to $1
. The next pattern checks if first column is equal to our variable. If it matches, it returns a truth value. In awk
the truth value triggers default print of the line.