Search code examples
perlparsingbioperl

How to get all feature in a range from a GFF3 file in Perl?


I would like to write a Perl function that gets a GFF3 filename and a range (i.e. 100000 .. 2000000). and returns a reference to an array containing all names/accessions of genes found in this range.

I guess using bioperl will make sense, but I have very little experience with it. I can write a script that parses a GFF3 by my self, but if using bioperl (or another packagae) is not too complicated - I'd rather reuse their code.


Solution

  • use Bio::Tools::GFF;
    
    my $range_start = 100000;
    my $range_end   = 200000;
    
    my @features_in_range = ( );
    
    
    my $gffio = Bio::Tools::GFF->new(-file => $gff_file, -gff_version => 3);
    
    while (my $feature = $gffio->next_feature()) {
    
        ## What about features that are not contained within the coordinate range but
        ## do overlap it?  Such features won't be caught by this check.            
        if (
            ($feature->start() >= $range_start)
            &&
            ($feature->end()   <= $range_end)
           ) {
    
            push @features_in_range, $feature;
    
        }
    
    }
    
    $gffio->close();
    

    DISCLAIMER: Naive implementation. I just banged that out, it's had no testing. I won't even guarantee it compiles.