Search code examples
c++regexboost-regex

Why does this regex match a string that is greater than 10 of the desired characters?


I have the following regular expression and I am using https://www.regextester.com/ to test it.

^(?=^.{10})[a-zA-Z]+:[0-9]+\s*

The requirement is that the input could be alpha characters and numbers separated by a colon with some trailing whitespace. The input must start with the alpha characters but could have superfluous characters after the trailing whitespace or the last number that I don't want to match after the 10th. The string to match must be exactly 10 characters. In the following example strings I have emboldened what I thought would match. I am not anchoring with a $ at the end because I know that the input string in question will likely have more than 10 characters so I am not trying to check that the entire string matches.

A:12345678 // matches which is fine

A:123456789 // Should only match up to the 8

FOO:567890s123 // should only match up to the 0

The actual result is that it is matching everything after the 10th character too so long as it is an alphanumeric or whitespace. I expect it to match up to the 10th character and nothing more. How do I fix this expression?

Update: I will eventually try to incorporated this regex into a C++ program using a boost regex to match.


Solution

  • If supported, you can use a lookbehind with a finite quantifier asserting 10 chars to the left at the end of the pattern:

    ^[A-Za-z]+:[0-9]+(?<=^.{10})
    

    The pattern matches:

    • ^ Start of string
    • [A-Za-z]+:[0-9]+ Match 1+ chars A-Za-z followed by : and 1+ digits
    • (?<=^.{10}) Positive lookbehind, assert that from the current position there are 10 characters to the left

    Regex demo

    If you want to match trailing whitespace chars:

    ^[A-Za-z]+:[0-9]+\s*(?<=^.{10})
    

    Note that \s can also match a newline.