Search code examples
regexdigits

Regex to match 3 numbers


Currently i'm trying to create a regex, that can match 3 numbers under some given circumstances. I've tried various attempts now, but it won't work with a single Expression - it's either "false positive" or "matching the wrong numbers"...

In words: I want to match ANY 3 digits that are

  • Appearing at the start of a string
  • Appering somewhere inside the string
  • (End of the string is NOT possible)

IF:

  • There is not another 3-digit-group matching this condition. (ambigious)
  • The group is not followed by "p" or "i"
  • The group is not lead by "x"

In Examples (the number in () is what i want to match):

  • This is (321) an example.
  • (321) also
  • including (321) //basically not possible, but can't hurt.
  • this (321) has another group with a p: 122p
  • this (321) has another group with a I: 123i
  • this x235 should be ignored cause (123) is what i want to match.
  • (123) is what i want, not x111 or 125p or 999i
  • in this 111 case there is no solution 555

(I need it like (1 number)(2 numbers) - but that would just be a little modification to a 3 number match)

My last attempt looked like this:

(?:[^x]|^)(\d{1})(\d{2})[^pi]

Regular expression visualization

Debuggex Demo

However it fails on the last case. I tried to cover this with preg_match_all(...) === 1 to make sure, only one result is matched

However, now a teststring like "101 202" will be positive, because the first check matches 101 (including the whitespace) and then does not match on 202, which makes the pattern assume that 101 is the only valid solution - which is wrong.

(?:[^x]|^)(\d{1})(\d{2})[^pi]

Regular expression visualization

Debuggex Demo

Any idea?

Note: It should work accross different regex engines, no matter if php, javascript, java, .net or Ook! :)


Solution

  • We can write the numbers you are looking for like this:

    re_n = (?:[^x]|^)\d\d\d(?:[^ip]|$)
    

    Then the whole expression is:

    ^(?!.*re_n.*re_n.*$).*(re_n)
    

    which basically eliminates double numbers using a negative lookahead following the line start anchor, then matches a valid number.

    The interpolated expression looks ugly:

    /^(?!.*(?:(?:[^x]|^)\d\d\d(?:[^ip]|$)).*(?:(?:[^x]|^)\d\d\d(?:[^ip]|$)).*$).*((?:(?:[^x]|^)\d\d\d(?:[^ip]|$)))/
    

    This Perl code:

    my $re_n = qr/(?:[^x]|^)\d\d\d(?:[^ip]|$)/;
    while (<DATA>) { chomp;
        if (/^(?!.*$re_n.*$re_n.*$).*($re_n)/) {
            print "$_: $1\n";
        } else {
            print "$_: NONE\n";
        }   
    }
    
    __DATA__
    This is 321 an example.
    321 also
    including 321 //basically not possible, but can't hurt.
    this 321 has another group with a p: 122p
    this 321 has another group with a I: 123i
    this x235 should be ignored cause 123 is what i want to match.
    123 is what i want, not x111 or 125p or 999i
    in this 111 case there is no solution 555
    

    Produces:

    This is 321 an example.:  321 
    321 also: 321 
    including 321 //basically not possible, but can't hurt.:  321 
    this 321 has another group with a p: 122p:  321 
    this 321 has another group with a I: 123i:  321 
    this x235 should be ignored cause 123 is what i want to match.:  123 
    123 is what i want, not x111 or 125p or 999i: 123 
    in this 111 case there is no solution 555: NONE