Search code examples
bashawkfiltergrepcol

Extract lines based on a column matching one of multiple values


I have some files containing the following data:

 160-68 160 68 B-A 0011 3.80247
 160-68 160 68 B-A 0022 3.73454
 160-69 160 69 B-A 0088 2.76641
 160-69 160 69 B-A 0022 3.54446
 160-69 160 69 B-A 0088 4.24609
 160-69 160 69 B-A 0011 3.97644
 160-69 160 69 B-A 0021 1.82292

I need to extract lines having any of values (that can be negative: ex -12222) in an array in the 5th column.

Output with [0088, 0021]:

160-69 160 69 B-A 0088 2.76641
160-69 160 69 B-A 0088 4.24609
160-69 160 69 B-A 0021 1.82292

I'm currently doing this with Ruby, but is there a way to do it faster with Bash?

Thanks.


Solution

  • bash is unlikely to be faster than ruby: bash is generally pretty slow. I'd pick awk or perl

    awk -v values="0088 0021" '
        BEGIN {
            n = split(values, a)
            for (i=1; i<=n; i++) b[a[i]]=1
        }
        $5 in b
    ' file
    
    perl -ane 'BEGIN {%v = ("0088"=>1, "0021"=>1)} print if $v{$F[4]}' file