Search code examples
grep

grep pattern on multiple columns but one


I have a tab delimiter file with ~10,000 columns. I want to filter rows where some columns contain pattern. But I don't want to have the grep checking first column.

$ cut -f2,1691-1725 myfile.txt
1402138 2331    2331
1402422 181     630
1402795 1401            1401
1405425 2331
1405771 1727    1727
1406169 2331    2331
1406475 2252    2252
1408259 1744    1744

I tried this :

cut -f2,1691-1725 myfile.txt | grep -E '^140|^141|^143|^144|^145|^146|^148|^149' > myoutput.txt

but it keeps all because grep also applies on first column

Desired output :

1402795 1401            1401

I search a way with awk but can't find easy one without listing each columns

Thank you


Solution

  • If your data always starts at the exact same character column, you could use ^.{n} to skip n characters, then start matching.

    $ cat myfile.txt
    1402138 2331    2331
    1402422 181     630
    1402795 1401            1401
    1405425 2331
    1405771 1727    1727
    1406169 2331    2331
    1406475 2252    2252
    1408259 1744    1744
    
    $ cat myfile.txt | grep -E '^.{24}(140|141|143|144|145|146|148|149)'
    1402795 1401            1401
    

    The pattern above first matches any 24 characters (except newline) since the start of the line, then tries to match (140|141|143|144|145|146|148|149) starting from character position 25.

    In this scenario (140|141|143|144|145|146|148|149) could be simplified to 14[01345689] as well.