Search code examples
perlpattern-matchingpartial

Perl - partial pattern matching in a sequence of letters


I am trying to find a pattern using perl. But I am only interested with the beginning and the end of the pattern. To be more specific I have a sequence of letters and I would like to see if the following pattern exists. There are 23 characters. And I'm only interested in the beginning and the end of the sequence.

For example I would like to extract anything that starts with ab and ends with zt. There is always So it can be

abaaaaaaaaaaaaaaaaaaazt   

So that it detects this match but not

abaaaaaaaaaaaaaaaaaaazz   

So far I tried

if ($line =~ /ab[*]zt/) {
    print "found pattern ";
}

thanks


Solution

  • * is a quantifier and meta character. Inside a character class bracket [ .. ] it just means a literal asterisk. You are probably thinking of .* which is a wildcard followed by the quantifier.

    Matching entire string, e.g. "abaazt".

    /^ab.*zt$/
    

    Note the anchors ^ and $, and the wildcard character . followed by the zero or more * quantifier.

    Match substrings inside another string, e.g. "a b abaazt c d"

    /\bab\S*zt\b/
    

    Using word boundary \b to denote beginning and end instead of anchors. You can also be more specific:

    /(?<!\S)ab\S*zt(?!\S)/
    

    Using a double negation to assert that no non-whitespace characters follow or precede the target text.

    It is also possible to use the substr function

    if (substr($string, 0, 2) eq "ab" and substr($string, -2) eq "zt") 
    

    You mention that the string is 23 characters, and if that is a fixed length, you can get even more specific, for example

    /^ab.{19}zt$/
    

    Which matches exactly 19 wildcards. The syntax for the {} quantifier is {min, max}, and any value left blank means infinite, i.e. {1,} is the same as + and {0,} is the same as *, meaning one/zero or more matches (respectively).