Search code examples
performanceperlsubroutineslowdown

Is it possible to optimize performance in this part


I have a problem with performance in Perl. This is the code: http://pastebin.com/jpmhv395

It might have problems in other places as well, but the main problem is in line 336: The anagram_hash method seems to be called very often. The method actually is in a different module, here it is: http://pastebin.com/5NRC4bs8

The subroutine should work differently depending on whether an integer or a string was passed as argument.

Is the subroutine 'anagram_hash' causing the poor performance, or do you see anything other that could cause a drop in performance? If so, how could it be optimized?


Solution

  • I guess you could make a 256 element lookup table so you just do

    $result += $lookup{$char};
    

    instead of

    my $temp = ord($char);
    $result += $temp**5;
    

    but you should really run the profiler to see what the problem is first... here.

    EDIT (jm666 and ikegami) - Added the Benchmark example. As you can see by observing the results of power_goodloop and lookup_goodloop which vary only by whether exponentiation or a hash lookup is used, exponentiation is much faster. It's the poor loop that's slowing you down.

    use strict;
    use warnings;
    use feature qw( say );
    use Benchmark qw(:all);
    
    my @lookup = map { $_ ** 5 } 0..255;
    my %lookup = map { chr($_) => $_ ** 5 } 0..255;
    
    my $str = join '', map chr(rand(256)), 1..1000;
    
    say "test of the result";
    say anagram_hash1($str);
    say anagram_hash2($str);
    say anagram_hash3($str);
    say anagram_hash4($str);
    say anagram_hash5($str);
    say "";    
    cmpthese(-3, {
        'power_badloop'    => sub { anagram_hash1($str) },
        'hlookup_badloop'  => sub { anagram_hash2($str) },
        'power_goodloop'   => sub { anagram_hash3($str) },
        'hlookup_goodloop' => sub { anagram_hash4($str) },
        'alookup_goodloop' => sub { anagram_hash5($str) },
    });
    
    
    sub anagram_hash1 {
            my $result = 0;
            my $s      = shift;
            my $length = length($s);
            if ( $s =~ /[a-zA-Z]+/ ) {
                    for ( my $i = 0 ; $i < $length ; $i++ ) {
                            my $char = substr( $s, $i, 1 );
                            my $temp = ord($char);
                            $result += $temp**5;
                    }
            } elsif ( $s =~ /^[\d]+$/ ) {
                    my $temp = int($s);
                    $result += $temp**5;
            } else {
                    die "Invalid parameter passed to method 'anagram_hash'\nExpected: String or Number\nPassed: $s";
            }
            return $result;
    }
    sub anagram_hash2 {
            my $result = 0;
            my $s      = shift;
            my $length = length($s);
            if ( $s =~ /[a-zA-Z]+/ ) {
                    for ( my $i = 0 ; $i < $length ; $i++ ) {
                            my $char = substr( $s, $i, 1 );
                            $result += $lookup{$char};
                    }
            } elsif ( $s =~ /^[\d]+$/ ) {
                    my $temp = int($s);
                    $result += $temp**5;
            } else {
                    die "Invalid parameter passed to method 'anagram_hash'\nExpected: String or Number\nPassed: $s";
            }
            return $result;
    }
    
    sub anagram_hash3 {
            my $result = 0;
            my $s      = shift;
            if ( $s =~ /[a-zA-Z]/ ) {
                    $result += $_ ** 5 for unpack "C*", $s;
            } elsif ( $s =~ /^[\d]+$/ ) {
                    $result += int($s) ** 5;
            } else {
                    die "Invalid parameter passed to method 'anagram_hash'\nExpected: String or Number\nPassed: $s";
            }
            return $result;
    }
    
    sub anagram_hash4 {
            my $result = 0;
            my $s      = shift;
            if ( $s =~ /[a-zA-Z]/ ) {
                    $result += $lookup{$_} for unpack "(a)*", $s;
            } elsif ( $s =~ /^[\d]+$/ ) {
                    $result += int($s) ** 5;
            } else {
                    die "Invalid parameter passed to method 'anagram_hash'\nExpected: String or Number\nPassed: $s";
            }
            return $result;
    }
    
    sub anagram_hash5 {
            my $result = 0;
            my $s      = shift;
            if ( $s =~ /[a-zA-Z]/ ) {
                    $result += $lookup[$_] for unpack "C*", $s;
            } elsif ( $s =~ /^[\d]+$/ ) {
                    $result += int($s) ** 5;
            } else {
                    die "Invalid parameter passed to method 'anagram_hash'\nExpected: String or Number\nPassed: $s";
            }
            return $result;
    }
    

    Output:

    test of the result
    171658778879381
    171658778879381
    171658778879381
    171658778879381
    171658778879381
    
                       Rate power_badloop hlookup_badloop hlookup_goodloop power_goodloop alookup_goodloop
    power_badloop    2132/s            --            -25%             -35%           -71%             -74%
    hlookup_badloop  2826/s           33%              --             -14%           -62%             -66%
    hlookup_goodloop 3294/s           55%             17%               --           -56%             -60%
    power_goodloop   7446/s          249%            163%             126%             --             -10%
    alookup_goodloop 8298/s          289%            194%             152%            11%               --
    

    So, the results showing:

    • the original OP's code is the slowest
    • the second is Mark's solution (replacing both of ord/exp with the hash lookup) - so, the Mark's solution is FASTER than the original OP's code.

    finally, (as usually) Ikegami comes with 3 solutions what are much faster as the any of previous. :)