Search code examples
shellawkstrip

Awk: Strip all characters except whitelisted ones from a column


   awk 'length($1)==3 && length($2)==3 {print $1, $2 "\t", $5}' file.txt

I am trying to print the column with only one character.

It could be either A or B or C or D , negate every thing else in $5.

$5 in file.txt is

112C
222F
B212
F2334
C23
A123

I want the output to be

 C

 B

 C
 A

Solution

  • To remove all characters except A, B, C, and D from $5, use gsub(/[^ABCD]/, "", $5)

    Applied to your command:

    awk 'length($1)==3 && length($2)==3 { 
      gsub(/[^ABCD]/, "", $5);
      print $1, $2 "\t" $5
    }' file.txt