I would like to write a Perl function that gets a GFF3 filename and a range (i.e. 100000 .. 2000000). and returns a reference to an array containing all names/accessions of genes found in this range.
I guess using bioperl will make sense, but I have very little experience with it. I can write a script that parses a GFF3 by my self, but if using bioperl (or another packagae) is not too complicated - I'd rather reuse their code.
use Bio::Tools::GFF;
my $range_start = 100000;
my $range_end = 200000;
my @features_in_range = ( );
my $gffio = Bio::Tools::GFF->new(-file => $gff_file, -gff_version => 3);
while (my $feature = $gffio->next_feature()) {
## What about features that are not contained within the coordinate range but
## do overlap it? Such features won't be caught by this check.
if (
($feature->start() >= $range_start)
&&
($feature->end() <= $range_end)
) {
push @features_in_range, $feature;
}
}
$gffio->close();
DISCLAIMER: Naive implementation. I just banged that out, it's had no testing. I won't even guarantee it compiles.