Search code examples
unixawklinestext-processing

Remove lines with specific pattern


I have a file with this specific format:

T   11722   A   330:0:0:0:0:0   315:0:0:0:0:0
T   11723   B   0:330:0:0:0:0   0:316:0:0:0:0
T   11725   C   0:327:0:0:0:0   0:314:0:0:0:0
T   11726   D   330:0:0:0:0:0   314:0:0:0:0:0
T   11727   E   0:6:0:323:0:0   0:6:0:309:0:0
T   11728   F   0:0:0:328:0:0   0:1:0:314:0:0
T   11729   G   0:325:0:0:0:0   0:315:0:0:0:0

I would like to remove any lines that don't have two values in columns 4 and 5.

For instance, if a line has the specific format:

T   11722   A   330:0:0:0:0:0   315:0:0:0:0:0

remove it.

If it has the following format (two values per column in columns 4 and 5):

T   11727   E   0:6:0:323:0:0   0:6:0:309:0:0

Keep it.

Thus, the expected result should be:

T   11727   E   0:6:0:323:0:0   0:6:0:309:0:0
T   11728   F   0:0:0:328:0:0   0:1:0:314:0:0

I have no idea how to set up something under unix but I am guessing there should be an easy way around. Any help would be greatly appreciated.

Many thanks


Solution

  • Are you just trying to print lines where there's 2 or more non-zero values in $4 or $5? That'd be:

    $ awk 'gsub(/[1-9][0-9]*/,"&",$4)>1 || gsub(/[1-9][0-9]*/,"&",$5)>1' file
    T 11727 E 0:6:0:323:0:0 0:6:0:309:0:0
    T 11728 F 0:0:0:328:0:0 0:1:0:314:0:0