I want to create a hash key in perl hash key that looks like this (lowerR-10,UpperR-12) => 1. Here the key is (lowerR-10,UpperR-12) and its value is 1.
Actually I have a file like this. I have to find the overlap among the the elements.
A 10 12
A 10 15
Whose output will be
A 10 12 2
A 12 15 1
The last column shows the overlap among the elements. I would like to save the count in a hash for which I think the key should be like (lowerR-10,UpperR-12) this. If anyone can give some new suggestion regarding how to save the key it will be great too.
Thanks
Maybe the program below will get you close to a solution.
#!/usr/bin/perl
use strict;
use warnings;
use Set::IntSpan;
use Sort::Naturally;
my %data;
while (<DATA>) {
my ($chr, @start_stop) = split;
$data{$chr}{$_}++ for $start_stop[0] .. $start_stop[1];
}
for my $chr (nsort keys %data) {
my %counts;
while (my ($range_int, $count) = each %{ $data{$chr} } ) {
push @{ $counts{$count} }, $range_int;
}
for my $count (sort {$a <=> $b} keys %counts) {
my $set = Set::IntSpan->new(@{$counts{$count}});
for my $run ($set->sets) {
printf "%s %-10s count: %s\n", $chr, $run, $count;
}
}
print "\n";
}
__DATA__
chr1 100 500
chr1 25 50
chr1 10 50
chr1 60 80
chr1 12 40
chr1 41 45
chr1 20 45
chr1 48 80
chr1 4 60
chr2 2 40
chr3 4 90
chr1 5 40
chr2 1 30
chr1 6 20
chr4 9 100
chr1 2 20
chr2 2 90
chr1 6 20
chr4 4 30
chr2 4 90
chr3 3 90
chr2 4 90
chr4 3 90
chr2 4 30
It produced this output.
chr1 2-3 count: 1
chr1 100-500 count: 1
chr1 4 count: 2
chr1 51-59 count: 2
chr1 61-80 count: 2
chr1 5 count: 3
chr1 46-47 count: 3
chr1 60 count: 3
chr1 48-50 count: 4
chr1 6-9 count: 5
chr1 21-24 count: 5
chr1 41-45 count: 5
chr1 10-11 count: 6
chr1 25-40 count: 6
chr1 12-19 count: 7
chr1 20 count: 8
chr2 1 count: 1
chr2 2-3 count: 3
chr2 41-90 count: 3
chr2 31-40 count: 4
chr2 4-30 count: 6
chr3 3 count: 1
chr3 4-90 count: 2
chr4 3 count: 1
chr4 91-100 count: 1
chr4 4-8 count: 2
chr4 31-90 count: 2
chr4 9-30 count: 3
Update: I'll try to explain. The %counts
hash is created anew for each chromosome from the outer loop. The keys are the counts of each numbered position, say number 42 was seen 5 times. The value for each count is an anonymous array that has all numbers that were seen 5 times.
Set::IntSpan is used to create ranges, (6-8, 21-24, 41-45), from the anonymous array, (which has 6,7,8,21,22,23,24,41,42,43,44,45 as elements in the anon array). The line for my $run ($set->sets)
, gets each run list for numbers seen 5 times, (6-8, 21-24, 41-45) and then prints them. You can look at the documentation for Set::IntSpan although it doesn't provide many helpful examples, and I've not been able to find any other good examples by net search, sorry. But basically, you feed Set::IntSpan ranges of numbers and it can give you the condensed subsets, (6-8, 21-24, etc), or each individual number in a set depending on the Set::IntSpan method you use to access the data held by a IntSpan object.
Hope this clears up some questions you had. :-)