I have two strings $base
and $succ
and want to check whether $succ
can be created from $base
by inserting arbitrary strings at arbitrary positions.
Some examples, written as isSucc($base
, $succ
):
We can assume, that $base
and $succ
do not contain any (vertical) withespace. Usually we have length($base) < length($succ) < 1000
. Good performance is not critical but would be nice to have.
I already know one way to implement isSucc
*.
isSucc
?isSucc
?Compute the edit/Levenshtein distance with a custom cost model (cost(Insertion)=0, cost(Deletion)=cost(substitution)=1). Then check whether the edit distance is 0.
I wanted to compare the three solutions loop, greedy matching, and non-greedy matching, but the matching methods often took more than 100 times as long as the loop solution so I aborted the tests. Nevertheless – or perhaps exactly for that reason – we have a clear winner: The loop solution.
Big thanks to Christoffer Hammarström.
sub is_subsequence {
my ($needles, $haystack) = @_;
my $found = 0;
for my $needle (split '', $needles) { # for each character $needle in $needles
$found = 1 + index $haystack, $needle, $found; # find it after the previous one in $haystack
return 0 unless $found; # return false if we can't
}
return 1; # return true if we found all $needles in $haystack
}
use Test::More tests => 6; # 1..6
is 1, is_subsequence("abc", "abcX"); # ok 1
is 1, is_subsequence("abc", "XabYcZ"); # ok 2
is 0, is_subsequence("abc", "abX"); # ok 3
is 0, is_subsequence("abc", "cab"); # ok 4
is 1, is_subsequence("abc", "cabc"); # ok 5
is 1, is_subsequence("", "anything"); # ok 6