Search code examples
perlbioinformaticsbioperl

How can I download the entire GenBank file with just an accession number?


I've got an array full of accession numbers, and I'm wondering if there's a way to automatically save genbank files using BioPerl. I know you can grab sequence information, but I want the entire GenBank record.

#!/usr/bin/env perl
use strict;
use warnings;
use Bio::DB::GenBank;

my @accession;
open (REFINED, "./refine.txt") || die "Could not open: $!";

while(<REFINED>){
    if(/^(\D+)\|(.*?)\|/){
    push(@accession, $2);
    }
}
close REFINED;
foreach my $number(@accession){

    my $db_obj = Bio::DB::GenBank->new;
    }

Solution

  • You can save the full genbank records by using Bio::DB::EUtilities. Here is an example that will take a list of IDs and save genbank records for each in a file called myseqs.gb:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    use Bio::DB::EUtilities;
    
    my @ids = qw(1621261 89318838 68536103 20807972 730439);
    
    my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch',
                                           -db      => 'protein',
                                           -rettype => 'gb',
                                           -email   => '[email protected]',
                                           -id      => \@ids);
    
    my $file = 'myseqs.gb';
    
    # dump HTTP::Response content to a file (not retained in memory)
    $factory->get_Response(-file => $file);
    

    If you want to split the individual records returned instead of having them all in one file, this can easily be done with Bio::SeqIO. Check out the EUtilities HOWTO and the EUtilities Cookbook for more examples and explanation.