Search code examples
phppreg-matchpcreregex-negation

Finding presence of chars or strings out of the allowed ones


Well, I'm stuck, I cannot find the correct form for the RegEx to provide to the PHP preg_match.

I have two strings. Say "mdo" and "o", but they could be really random.

I have a dictionary of allowed chars and strings.

For the example, allowed chars are "a-gm0-9", and allowed strings are "do" and "si".

THE GOAL

I'm trying to check that the input string doesn't contain any char or string but those in the dictionary, case-insensitive.

So the case of "mdo" wouldn't match because m is allowed just like the string do. Not the same for o instead, which has o that is not an allowed char and which doesn't contain the whole allowed string do.

My struggling reason

It's ok to negate [^a-gm0-9] and (?!do|si), but what I cannot achieve is to place them inside a single regex in order to apply the following PHP code:

<?php
  $inputStr = 'mdo';
  $rex = '/?????/i';      // the question subject
  // if disallowed chars/strings are found...
  if( preg_match($regex, $inputStr) == 1 )
    return false;     // the $inputStr is not valid
  return true;
?>

Because two cascading preg_matches would break the logic and don't work.

How to mix chars check and groups check in "AND" in a single regex? Their positions don't matter.


Solution

  • You can use this pattern:

    return (bool) preg_match('~^(?:do|si|[a-gm0-9])*+\C~i', $inputStr);
    

    The idea is to match all allowed chars and substrings from the start in a repeated group with a possessive quantifier and to check if a single byte \C remains. Since the quantifier is greedy and possessive, the single byte after, if found, can't be allowed.

    Note that most of the time, it is more simple to negate the preg_match function, example:

    return (bool) !preg_match('~^(?:do|si|[a-gm0-9])*$~iD', $inputStr);
    

    (or with a + quantifier, if you don't want to allow empty strings)