Can I create a hash key in perl like this (lowerR-10,UpperR-12) => 1

I want to create a hash key in perl hash key that looks like this (lowerR-10,UpperR-12) => 1. Here the key is (lowerR-10,UpperR-12) and its value is 1.

Actually I have a file like this. I have to find the overlap among the the elements.

A 10 12

A 10 15

Whose output will be

A 10 12 2

A 12 15 1

The last column shows the overlap among the elements. I would like to save the count in a hash for which I think the key should be like (lowerR-10,UpperR-12) this. If anyone can give some new suggestion regarding how to save the key it will be great too.

Thanks

Solution

Maybe the program below will get you close to a solution.

#!/usr/bin/perl
use strict;
use warnings;
use Set::IntSpan;
use Sort::Naturally;

my %data;

while (<DATA>) {
    my ($chr, @start_stop) = split; 
    $data{$chr}{$_}++ for $start_stop[0] .. $start_stop[1];
}

for my $chr (nsort keys %data) {
    my %counts;
    while (my ($range_int, $count) = each %{ $data{$chr} } ) {
        push @{ $counts{$count} }, $range_int;  
    }

    for my $count (sort {$a <=> $b} keys %counts) {
        my $set = Set::IntSpan->new(@{$counts{$count}});
        for my $run ($set->sets) {
            printf "%s %-10s count: %s\n", $chr, $run, $count;
        }
    }
    print "\n";
}

__DATA__
chr1    100    500
chr1    25      50
chr1    10       50
chr1    60       80
chr1    12       40
chr1    41       45
chr1     20      45
chr1     48      80
chr1    4   60
chr2    2   40
chr3    4   90
chr1    5   40
chr2    1   30
chr1    6   20
chr4    9   100
chr1    2   20
chr2    2   90
chr1    6   20
chr4    4   30
chr2    4   90
chr3    3   90
chr2    4   90
chr4    3   90
chr2    4   30

It produced this output.

chr1 2-3        count: 1
chr1 100-500    count: 1
chr1 4          count: 2
chr1 51-59      count: 2
chr1 61-80      count: 2
chr1 5          count: 3
chr1 46-47      count: 3
chr1 60         count: 3
chr1 48-50      count: 4
chr1 6-9        count: 5
chr1 21-24      count: 5
chr1 41-45      count: 5
chr1 10-11      count: 6
chr1 25-40      count: 6
chr1 12-19      count: 7
chr1 20         count: 8

chr2 1          count: 1
chr2 2-3        count: 3
chr2 41-90      count: 3
chr2 31-40      count: 4
chr2 4-30       count: 6

chr3 3          count: 1
chr3 4-90       count: 2

chr4 3          count: 1
chr4 91-100     count: 1
chr4 4-8        count: 2
chr4 31-90      count: 2
chr4 9-30       count: 3

Update: I'll try to explain. The %counts hash is created anew for each chromosome from the outer loop. The keys are the counts of each numbered position, say number 42 was seen 5 times. The value for each count is an anonymous array that has all numbers that were seen 5 times.

Set::IntSpan is used to create ranges, (6-8, 21-24, 41-45), from the anonymous array, (which has 6,7,8,21,22,23,24,41,42,43,44,45 as elements in the anon array). The line for my $run ($set->sets), gets each run list for numbers seen 5 times, (6-8, 21-24, 41-45) and then prints them. You can look at the documentation for Set::IntSpan although it doesn't provide many helpful examples, and I've not been able to find any other good examples by net search, sorry. But basically, you feed Set::IntSpan ranges of numbers and it can give you the condensed subsets, (6-8, 21-24, etc), or each individual number in a set depending on the Set::IntSpan method you use to access the data held by a IntSpan object.

Hope this clears up some questions you had. :-)