Search code examples
phpregexpreg-match

Set length limits to specific character class parts in Unicode regular expression


Below my regual expression:

preg_match('/^[\p{L}\p{N} @]+$/u', $string);

My goal is set minimum and maximum length of \p{L}, \p{N}, @ and the whole string. I tried to putting {min, max} after \p{L} and after each part but it doesn't work.


Solution

  • You can set the min and max length to a pattern with the help of limiting quantifiers right after the subpattern that you need to repeat.

    Here we need to use a trick to make sure we can count non-consecutive subpatterns. It can be done with negative character classes and look-aheads in the beginning.

    Here is an example of the regex for *at least 4 letters \p{L}, at least 5 and 6 max numbers \p{N}, and at least three @:

    ^(?=(?:[^\n\p{L}]*\p{L}){4}[^\n\p{L}]*$)(?=(?:[^\n\p{N}]*\p{N}){5,6}[^\n\p{N}]*$)(?=(?:[^\n@]*@){3}[^\n@]*$)[\p{L}\p{N} @]+$
    

    Here is a demo

    Note that \n can be removed if you are not planning to use multiline mode (m flag).

    The 3 conditions are inside look-aheads:

    • (?=(?:[^\n\p{L}]*\p{L}){4}[^\n\p{L}]*$) - This lookahead matches (from the beginning of input string) any sequence that is not letters and then a letter 4 times (you may set any other limits here, and then looks for non-letters up to the end (if it finds more, it fails).
    • (?=(?:[^\n\p{N}]*\p{N}){5,6}[^\n\p{N}]*$) - a similar lookahead, but now, we are matching non-digits + a digit 5 or 6 times, and make sure there are no numbers later.
    • (?=(?:[^\n@]*@){3}[^\n@]*$) - same logic for @.

    If you need to only set a minimum threshold, you do not need those negated character classes at the end of a lookahead, e.g. (?=(?:[^\n@]*@){3}) will match 3 or more @, it will just require 3 @s.