Search code examples
stringperlmismatch

perl count mismatch between two strings


I need to just count mismatch between two strings. Let say:

my $s1 = "ATCG";
my $s2 = "ATTG"; 

This should give: 1 as mismatch. No need to find position or what are the mismatches.

I was looking for fast way to do. I thought splitting into arrays and matching in loop or using substr to match each position may be slow because need to be checked for more than billion pairs. Thanks


Solution

  • Just XOR the two strings together. Each NUL character in the result represents a position where the characters are the same in both strings.

    my ($s1, $s2) = qw( ATCG ATTG );
    
    my $count = ( $s1 ^ $s2 ) =~ tr/\0//c;
    
    print "$count\n";   # Prints "1"
    

    Note: If you're going to repeatedly compare a string, pass it and the one to which you are comparing it to utf8::downgrade to makes sure the ^ is as fast as it can be.

    utf8::downgrade($s1);  # Change the internal format in which
    utf8::downgrade($s2);  #   the strings are stored to speed up $s1^$s2.
    

    This is useless/wasteful if either string contains UNICODE chars above U+00FF.