I need to just count mismatch between two strings. Let say:
my $s1 = "ATCG";
my $s2 = "ATTG";
This should give: 1 as mismatch. No need to find position or what are the mismatches.
I was looking for fast way to do. I thought splitting into arrays and matching in loop or using substr to match each position may be slow because need to be checked for more than billion pairs. Thanks
Just XOR the two strings together. Each NUL character in the result represents a position where the characters are the same in both strings.
my ($s1, $s2) = qw( ATCG ATTG );
my $count = ( $s1 ^ $s2 ) =~ tr/\0//c;
print "$count\n"; # Prints "1"
Note: If you're going to repeatedly compare a string, pass it and the one to which you are comparing it to utf8::downgrade
to makes sure the ^
is as fast as it can be.
utf8::downgrade($s1); # Change the internal format in which
utf8::downgrade($s2); # the strings are stored to speed up $s1^$s2.
This is useless/wasteful if either string contains UNICODE chars above U+00FF.