Search code examples
phpregexpreg-match

Preg_match returning "extra" empty matches for new lines


The following expression is returning what I need, BUT is giving an extra empty match for each, as well as for any blank lines. This results in 5 valid text lines returning 10 matches. I expected it's in the way that I'm specifying the last capture group, or not making Capture Group #2 required.

How can I "ignore" the new line character (or whatever is triggering the extra match)

/(\d+[a-z]?\.)?[ ]?(.*)/g

11a. A numbered agenda item
Unnumbered agenda item
12. Another numbered agenda item
Another UNnumbered agenda item
13. A numbered agenda item

I need to extract the Agenda Item text, AND the preceding number (if present).

Demo at https://regex101.com/r/vB0H5s/1


Solution

  • In your pattern you are using quantifiers ? and * which are all optional, and can also match an empty string.

    The reason you get 10 matches instead of 5 is that the pattern is unanchored. As all parts are optional, the last .* can "match" the last position in the string.

    You can use (.+) to capture 1 or more characters in the second capture group.

    If the match should be at the start of the string, you can use an anchor ^

    ^(\d+[a-z]?\.)?[ ]?(.+)
    

    See a regex demo