Search code examples
regexperl

Perl pattern matching "nothing"/empty


This is driving me nuts!

  1. I read a txt file into a string called $filestring.

    sysopen(handle, $filepath, O_RDONLY) or die "WHAT?";
    local $/ = undef;
    my $filestring = <handle>;
    
  2. I made a pattern variable called $regex which is generated dynamically, but takes on the format:

    (a)|(b)|(c)
    
  3. I search the text for patterns separated by a space

    while($filestring =~ m/($regex)\s($regex)/g){
       print "Match: $1 $2\n";
       #...more stuff
    }
    

Most of the matches are valid, but for some reason I get a match like the following every once and a while:

Match: and 

whereas a normal match should have two outputs like the following:

Match: , and

Does anyone know what might be causing this?

EDIT: it appears that the NULL character is being matched in the pattern.


Solution

  • Each of the alternatives in your regexp is a separate capture group. The whole regexp looks like:

    ((a)|(b)|(c))\s((a)|(b)|(c))
    12   3   4     56   7   8
    

    I've notated it with the capture group number for each piece of the regexp.

    So if $filestring is b a, $1 will be b, $2 will be the empty strying because nothing matched (a).

    To avoid this, you should use non-capturing groups for the alternatives:

    ((?:a)|(?:b)|(?:c))\s((?:a)|(?:b)|(?:c))