Let me explain my problem using a dummy example. This is file A -
1 10 20 aa
2 30 40 bb
3 60 70 cc
. .. .. ..
and This is file B -
10 15 xx yy mm
21 29 mm nn ss
11 18 rr tt yy
69 90 qq ww ee
.. .. .. .. ..
I am trying to merge these files A and B such that there exist some overlapping between A's row and B's row.
Overlapping between A's row and B's row, in my case: there is something common between range starting from $2 to $3 for A's row and range starting from $1 to $2 for B's row. in above example, there is overlapping between range(10,20) and range(10,15). Here range(10,20) = [10,11,12,13,14,15,16,17,18,19] and range(10,15) = [10,11,12,13,14]
So the expected output is -
1 10 20 aa 10 15 xx
1 10 20 aa 11 18 rr
3 60 70 cc 69 90 qq
I tried this way (using and awk):
for peak in State.peaks:
i = peak[-1]
peak = peak[:-1]
a = peak[1]
b = peak[2]
d = State.delta
c = ''' awk '{id=%d;delta=%d;a=%d;b=%d;x=%s;y=%s;if((x<=a&&y>a)||(x<=b&&y>b) || (x>a&&y<=b)) print id" "$7" "$3-$2} ' %s > %s ''' % (i, d, a, b, "$2-d", "$3+d", State.fourD, "file"+str(name))
os.system(c)
Wanted to remove python part completely as it is taking much time.
This Awk script does the job:
NR == FNR { record[NR] = $0; lo[NR] = $2; hi[NR] = $3; nrecs = NR; next }
NR != FNR { # Overlap: lo[A] < hi[B] && lo[B] < hi[A]
for (i = 1; i <= nrecs; i++)
{
if (lo[i] < $2 && $1 < hi[i])
print record[i], $1, $2, $3
}
}
I saved it as range-merge-53.awk
(53
is simply a random double-digit prime). I created file.A
and file.B
from your sample data, and ran:
$ awk -f range-merge-53.awk file.A file.B
1 10 20 aa 10 15 xx
1 10 20 aa 11 18 rr
3 60 70 cc 69 90 qq
$
The key is the 'overlap' condition, which must exclude the high value of each range — often denoted [lo..hi)
for an open-closed range.
It would be possible to omit either the next
or the NR != FNR
condition (but not both) and the code would work as well.
See also Determine whether two date ranges overlap — the logic of ranges applies to dates and integers and floating point, etc.