Search code examples
perlsortingnamespacesbioperl

Trying to custom-sort a phylogenetic tree in BioPerl results in "undefined value" or "unblessed reference" error


I have a phylogenetic tree I am loading in BioPerl. I want to use a custom sort function ("$depth_sort" below) to order the nodes vertically when printing out the tree. However the documented method used below writes out the tree in the same order it was read in, without actually sorting:

use strict;
use Bio::TreeIO;

my $input = new Bio::TreeIO(-file=>"input.tree", -format=>'newick');
my $tree = $input->next_tree;

my $depth_sort = sub { $a->depth <=> $b->depth };

#this is the way it is supposed to work:
my $out = new Bio::TreeIO(
     -file     => ">sorted.tree", 
     -format   => 'newick', 
     -order_by => $depth_sort
);
$out->write_tree($tree);

Instead, I can manually iterate through the tree to see what is happening. If I do my own sort locally, it works as expected:

my $rnode = $tree->get_root_node;
&printNode($rnode);

sub printNode {
    my $thisNode = shift;

    #print the node identity and depth, just to keep track:
    print "this is " . $thisNode->internal_id . 
          " with depth " . $thisNode->depth . "\n";

    if (! $thisNode->is_Leaf) {
        my @children = sort $depth_sort $thisNode->each_Descendent();
        for my $otherNode (@children) { &printNode($otherNode); }
    }
}

But, if I pass the custom sort to each_Descendent (the way the write call above is supposed to):

my @children = $thisNode->each_Descendent($depth_sort);

Then it dies with the message:

Can't call method "depth" on an undefined value at treeFlipper.pl line 7, line 1.

I found another thread here that I think has me on the right track, but I haven't solved it yet: Perl sorting; dealing with $a, $b package globals across namespaces cleanly

Switching to the prototyping method in the first answer:

my $depth_sort = sub ($$) { 
    my ($a1,$b1) = @_;
    return ($a1->depth <=> $b1->depth) };

gives me:

Can't call method "depth" on unblessed reference at treeFlipper.pl line 9, line 1.

But if I check the ref type, that seems correct:

print ref($a1) . "\n";

gives

Bio::Tree::Node

When I try force-blessing the reference (this is probably a terrible thing to do):

bless $a1, 'Bio::Tree::Node';
bless $b1, 'Bio::Tree::Node';

I get:

Not a HASH reference at /usr/local/share/perl/5.14.2/Bio/Tree/Node.pm line 433, line 1.

The other methods in that thread (using caller) give me the same old "undefined value" error.

Is this a problem with BioPerl, or (as I suspect) am I still missing something? Thanks!

edit: using Perl 5.14.2 on Ubuntu 12.04 LTS


Solution

  • There turned out to be three parts to this:

    1. The sort function does need to be prototyped and indexed directly OR accessed through the calling package, as documented (for instance) in the other thread I linked to.

    2. Why it still didn't work after fixing the sort function is a bit of a mystery, but is probably non-reproducible. It was solved by reinstalling BioPerl.

    3. Finally, it turns out that the current version of BioPerl (I have 1.6.9) doesn't fully implement the "order_by" parameter --the Bio::TreeIO::newick constructor does not set the passed-in value like it should. However, I was able to get around this by explicitly setting the parameter after construction:

      my $out = new Bio::TreeIO(
           -file     => ">sorted.tree", 
           -format   => 'newick', 
      );
      $out->get_params->{'order_by'}=$depth_sort;
      $out->write_tree($tree);
      

    This worked as expected.