Search code examples
regexregex-lookaroundsnon-greedy

RegEx: How to apply a Character Set restriction to a whole Expression


Let's say I have a regex which is used to validate email addresses, such as:

/^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

Now, let's say I also want to make sure that the following character set applies to the whole string too:

[\x00-\x7F]

How would I go about applying this 2nd character set restriction to the whole pattern.

Result would be that:

  • jake.howlett@howlett.house (passes)
  • jake.howẟlett@howlett.house (fails, as the ẟ is outside of the 2nd character set)

Solution

  • You may add it in a positive lookahead after checking the start of string:

    ^(?=[\x00-\x7F]+$)your_pattern_here
     ^^^^^^^^^^^^^^^^^
    

    After checking the start of string position with ^, (?=[\x00-\x7F]+$) will be executed once and will require the whole string to be composed of only ASCII chars (note + matches 1 or more occurrences, and $ tests the end of string position).

    The regex will look like

    ^(?=[\x00-\x7F]+$)(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$
    

    See the regex demo