I have a file with a list of value ranges:
2 4
6 9
13 14
and a second file that looks like this:
HiC_scaffold_1 1 26
HiC_scaffold_1 2 27
HiC_scaffold_1 3 27
HiC_scaffold_1 4 31
HiC_scaffold_1 5 34
HiC_scaffold_1 6 35
HiC_scaffold_1 7 37
HiC_scaffold_1 8 37
HiC_scaffold_1 9 38
HiC_scaffold_1 10 39
HiC_scaffold_1 11 39
HiC_scaffold_1 12 39
HiC_scaffold_1 13 39
HiC_scaffold_1 14 39
HiC_scaffold_1 15 42
and I would like to exclude rows from file 2 where the value of column 2 falls within a range defined by file 1. The ideal output would be:
HiC_scaffold_1 1 26
HiC_scaffold_1 5 34
HiC_scaffold_1 10 39
HiC_scaffold_1 11 39
HiC_scaffold_1 12 39
HiC_scaffold_1 15 42
I know how to extract a single range with awk:
awk '$2 == "2", $2 == "4"' file2.txt
but my file 1 has many many range values (lines) and I need to exclude rather than extract the rows that correspond to these values.
This is one awy:
$ awk '
NR==FNR { # first file
min[NR]=$1 # store mins and maxes in pairs
max[NR]=$2
next
}
{ # second file
for(i in min)
if($2>=min[i]&&$2<=max[i])
next
}1' ranges data
Output:
HiC_scaffold_1 1 26
HiC_scaffold_1 5 34
HiC_scaffold_1 10 39
HiC_scaffold_1 11 39
HiC_scaffold_1 12 39
HiC_scaffold_1 15 42
If the ranges are not huge and integer valued but the data is huge, you could make an exclude map of the values to speed up comparing:
$ awk '
NR==FNR { # ranges file
for(i=$1;i<=$2;ex[i++]); # each value in the range goes to exclude hash
next
}
!($2 in ex)' ranges data # print if not found in ex hash