I have a tab delimiter file with ~10,000 columns. I want to filter rows where some columns contain pattern. But I don't want to have the grep checking first column.
$ cut -f2,1691-1725 myfile.txt
1402138 2331 2331
1402422 181 630
1402795 1401 1401
1405425 2331
1405771 1727 1727
1406169 2331 2331
1406475 2252 2252
1408259 1744 1744
I tried this :
cut -f2,1691-1725 myfile.txt | grep -E '^140|^141|^143|^144|^145|^146|^148|^149' > myoutput.txt
but it keeps all because grep also applies on first column
Desired output :
1402795 1401 1401
I search a way with awk but can't find easy one without listing each columns
Thank you
If your data always starts at the exact same character column, you could use ^.{n}
to skip n
characters, then start matching.
$ cat myfile.txt
1402138 2331 2331
1402422 181 630
1402795 1401 1401
1405425 2331
1405771 1727 1727
1406169 2331 2331
1406475 2252 2252
1408259 1744 1744
$ cat myfile.txt | grep -E '^.{24}(140|141|143|144|145|146|148|149)'
1402795 1401 1401
The pattern above first matches any 24 characters (except newline) since the start of the line, then tries to match (140|141|143|144|145|146|148|149)
starting from character position 25.
In this scenario (140|141|143|144|145|146|148|149)
could be simplified to 14[01345689]
as well.