Search code examples
perlhashperl-hash

Can I create a hash key in perl like this (lowerR-10,UpperR-12) => 1


I want to create a hash key in perl hash key that looks like this (lowerR-10,UpperR-12) => 1. Here the key is (lowerR-10,UpperR-12) and its value is 1.

Actually I have a file like this. I have to find the overlap among the the elements.

A 10 12

A 10 15

Whose output will be

A 10 12 2

A 12 15 1

The last column shows the overlap among the elements. I would like to save the count in a hash for which I think the key should be like (lowerR-10,UpperR-12) this. If anyone can give some new suggestion regarding how to save the key it will be great too.

Thanks


Solution

  • Maybe the program below will get you close to a solution.

    #!/usr/bin/perl
    use strict;
    use warnings;
    use Set::IntSpan;
    use Sort::Naturally;
    
    my %data;
    
    while (<DATA>) {
        my ($chr, @start_stop) = split; 
        $data{$chr}{$_}++ for $start_stop[0] .. $start_stop[1];
    }
    
    for my $chr (nsort keys %data) {
        my %counts;
        while (my ($range_int, $count) = each %{ $data{$chr} } ) {
            push @{ $counts{$count} }, $range_int;  
        }
    
        for my $count (sort {$a <=> $b} keys %counts) {
            my $set = Set::IntSpan->new(@{$counts{$count}});
            for my $run ($set->sets) {
                printf "%s %-10s count: %s\n", $chr, $run, $count;
            }
        }
        print "\n";
    }
    
    __DATA__
    chr1    100    500
    chr1    25      50
    chr1    10       50
    chr1    60       80
    chr1    12       40
    chr1    41       45
    chr1     20      45
    chr1     48      80
    chr1    4   60
    chr2    2   40
    chr3    4   90
    chr1    5   40
    chr2    1   30
    chr1    6   20
    chr4    9   100
    chr1    2   20
    chr2    2   90
    chr1    6   20
    chr4    4   30
    chr2    4   90
    chr3    3   90
    chr2    4   90
    chr4    3   90
    chr2    4   30
    

    It produced this output.

    chr1 2-3        count: 1
    chr1 100-500    count: 1
    chr1 4          count: 2
    chr1 51-59      count: 2
    chr1 61-80      count: 2
    chr1 5          count: 3
    chr1 46-47      count: 3
    chr1 60         count: 3
    chr1 48-50      count: 4
    chr1 6-9        count: 5
    chr1 21-24      count: 5
    chr1 41-45      count: 5
    chr1 10-11      count: 6
    chr1 25-40      count: 6
    chr1 12-19      count: 7
    chr1 20         count: 8
    
    chr2 1          count: 1
    chr2 2-3        count: 3
    chr2 41-90      count: 3
    chr2 31-40      count: 4
    chr2 4-30       count: 6
    
    chr3 3          count: 1
    chr3 4-90       count: 2
    
    chr4 3          count: 1
    chr4 91-100     count: 1
    chr4 4-8        count: 2
    chr4 31-90      count: 2
    chr4 9-30       count: 3
    

    Update: I'll try to explain. The %counts hash is created anew for each chromosome from the outer loop. The keys are the counts of each numbered position, say number 42 was seen 5 times. The value for each count is an anonymous array that has all numbers that were seen 5 times.

    Set::IntSpan is used to create ranges, (6-8, 21-24, 41-45), from the anonymous array, (which has 6,7,8,21,22,23,24,41,42,43,44,45 as elements in the anon array). The line for my $run ($set->sets), gets each run list for numbers seen 5 times, (6-8, 21-24, 41-45) and then prints them. You can look at the documentation for Set::IntSpan although it doesn't provide many helpful examples, and I've not been able to find any other good examples by net search, sorry. But basically, you feed Set::IntSpan ranges of numbers and it can give you the condensed subsets, (6-8, 21-24, etc), or each individual number in a set depending on the Set::IntSpan method you use to access the data held by a IntSpan object.

    Hope this clears up some questions you had. :-)