Search code examples
stringperlcomparisonsubstringmaxlength

Perl: Find all matched substrings of two strings


Maybe there is a function, which can find every (maximal by length) equal substring of string1 and string2 in perl, isn't it?

I can find every substring in string, using m/substring/g;.

For searching all equal substrings, I must shift the pointer of string1's begin and symbol-by-simbol compare strings. How can I do it in perl, or is there a way easer? (the ready function)

Thank you in advance.

my $string1 = "... (i==i)kn;i=n.n;k(i(i,"%i",&i);i ..."; my $string2 = "... k;kn;i=n.n;k;k(i(i,"%i",&i);k ..."; my @answer = ( ..., "kn;i=n.n;", "k(i(i,"%i",&i);", ... );


Solution

  • Your example seems to show returning two different lengths of substring, with the shorter one first, so I'm not sure what "maximal by length" means. But this may help:

    use Tree::Suffix;
    my $string1 = '(i==i)kn;i=n.n;k(i(i,"%i",&i);i';
    my $string2 = 'k;kn;i=n.n;k;k(i(i,"%i",&i);k';
    my $tree = Tree::Suffix->new($string1, $string2);
    my @answer;
    my $min_length = 1;
    my $max_length = 0; # 0 initially means no limit
    do {
        my @by_length = $tree->lcs($min_length,$max_length);
        last unless @by_length;
        # don't include any substrings that are substrings of substrings already found
        for my $new_substring (@by_length) {
            push @answer, $new_substring if 0 == grep $_ =~ /\Q$new_substring/, @answer;
        }
        $max_length = length($by_length[0])-1;
    } while $max_length >= $min_length;
    use Data::Dumper;
    print Dumper \@answer;
    

    output:

    $VAR1 = [
          ';k(i(i,"%i",&i);',
          'kn;i=n.n;k'
        ];
    

    Tree::Suffix was kind of a pain to install; I had to delete the included inc/Devel/CheckLib.pm because it had errors and install Devel::CheckLib separately, as well as downloading and installing the libstree library.