Search code examples
regexperl

How can I determine the position a pattern matched in a string?


I want to match a pattern with this format /vX.X.X/ where X a number. For example: /v1.1.1/ and /v1.0.300/. After matching the pattern, how can I get the position in the string where I found the pattern?


Solution

  • The approach you take depends on what you are trying to accomplish and the rest of the stuff around the problem. Since you haven't said anything about this, here's a shotgun blast of different ideas. Not all of them may be appropriate for what you are doing.


    The @- special variable has the offsets for the starting position of the match groups. The first element is the start of the entire match, the second element (index 1) is the start of the $1 match, and so on. If your pattern is the entire string you want, then you can use the first element in that array:

    if( $string =~ /\bv\d+\.\d+\.\d+\b/ ) {
        my $position = $-[0];
        say "Position is $position";
        }
    

    If you have other stuff around you pattern and the stuff you want is in the first match group, you can use the second element (remember that match groups are numbered by the order of the opening parens):

    if( $string =~ /before (v\d+\.\d+\.\d+) after/ ) {
        my $position = $-[1];
        say "Position is $position";
        }
    

    When your pattern changes, you may need to update with element you use.

    There's also @+ that works the same but has the ending position. I have a bunch of examples of this in the first edition of Mastering Perl. I save it for that book because I find that many people get confused on which element corresponds to which part of the pattern. Consider if you'll remember this later.

    You can use index to get the position of the matched string:

    if( $string =~ /\b(v\d+\.\d+\.\d+)\b/ ) {
        my $matched = $1;
        my $position = index( $string, $matched );
        say "Position is $position";
        }
    

    Using the /p flag and ${^PREMATCH} variable from Perl v5.10, count the positions before the matched part of the string:

    use v5.10;
    
    if( $string =~ /\bv\d+\.\d+\.\d+\b/p ) {
        my $position = length ${^PREMATCH};
        say "Position is $position";
        }
    

    Use the /g flag in scalar context and Perl remembers the string position where the match ended. Subtract the match length to see where the match started:

    if( $string =~ /\b(v\d+\.\d+\.\d+)\b/g ) {
        my $matched = $1;
        my $position = pos( $string ) - length($1);
        say "Position is $position";
        }
    

    If there can be multiple matches per string, you'll have to adjust these. One way uses a while loops since condition is still a scalar context:

    while( $string =~ /\b(v\d+\.\d+\.\d+)\b/g ) {
        my $matched = $1;
        my $position = pos( $string ) - length($1);
        say "Position is $position";
        }