Search code examples
iosswiftregexnspredicatensregularexpression

How to escape a dynamic regex in swift?


So, I'm getting a hex string as such from the API 2f5e28285b5e3c3e28295b5c5d5c5c2e2c3b3a5c7340225d2b285c2e5b5e3c3e28295b5c5d5c5c2e2c3b3a5c7340225d2b292a297c28222e2b2229294028285c5b5b302d395d7b312c337d5c2e5b302d395d7b312c337d5c2e5b302d395d7b312c337d5c2e5b302d395d7b312c337d5d297c28285b612d7a412d5a5c2d302d395d2b5c2e292b5b612d7a412d5a5d7b322c7d2929242f

Once decoded to a utf string, this is the regex that's formed

/^(([^<>()[\]\\.,;:\s@"]+(\.[^<>()[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

This is a valid email regex as per some online regex validators. Now the issue arises in how to escape this string. I've tried the following code

if let data = emailRegex.hexadecimal, let string = String(data: data, encoding: .utf8) {
                guard NSPredicate(format: "SELF MATCHES %@", NSRegularExpression.escapedPattern(for: string))
                    .evaluate(with: email) else {
                        throw ValidationError.invalidInput
                }
                
                isValid = true
            }
            else {
                throw ValidationError.missingInput
            }

This results in the following escaped regex:

\\/\\^\\(\\(\\[\\^<>\\(\\)\\[\\\\]\\\\\\\\\\.,;:\\\\s@\"]\\+\\(\\\\\\.\\[\\^<>\\(\\)\\[\\\\]\\\\\\\\\\.,;:\\\\s@\"]\\+\\)\\*\\)\\|\\(\"\\.\\+\"\\)\\)@\\(\\(\\\\\\[\\[0-9]\\{1,3\\}\\\\\\.\\[0-9]\\{1,3\\}\\\\\\.\\[0-9]\\{1,3\\}\\\\\\.\\[0-9]\\{1,3\\}]\\)\\|\\(\\(\\[a-zA-Z\\\\-0-9]\\+\\\\\\.\\)\\+\\[a-zA-Z]\\{2,\\}\\)\\)\\$\\/

The following escaped regex results in wrong results for proper emails, it gives validation errors even for the right ones. Any help will be appreciated!

Edit 1: Updated code to

let string = String(String(data: data, encoding: .utf8)!.dropFirst().dropLast())

But compiler crashes on the following - enter image description here


Solution

  • Use

    ((?<!\\)(?:\\\\)*\[(?:\\.|[^\]\[])*)\[
    

    Replacement: $1\\[. See regex proof.

    EXPLANATION

    --------------------------------------------------------------------------------
      (                        group and capture to \1:
    --------------------------------------------------------------------------------
        (?<!                     look behind to see if there is not:
    --------------------------------------------------------------------------------
          \\                       '\'
    --------------------------------------------------------------------------------
        )                        end of look-behind
    --------------------------------------------------------------------------------
        (?:                      group, but do not capture (0 or more
                                 times (matching the most amount
                                 possible)):
    --------------------------------------------------------------------------------
          \\                       '\'
    --------------------------------------------------------------------------------
          \\                       '\'
    --------------------------------------------------------------------------------
        )*                       end of grouping
    --------------------------------------------------------------------------------
        \[                       '['
    --------------------------------------------------------------------------------
        (?:                      group, but do not capture (0 or more
                                 times (matching the most amount
                                 possible)):
    --------------------------------------------------------------------------------
          \\                       '\'
    --------------------------------------------------------------------------------
          .                        any character except \n
    --------------------------------------------------------------------------------
         |                        OR
    --------------------------------------------------------------------------------
          [^\]\[]                  any character except: '\]', '\['
    --------------------------------------------------------------------------------
        )*                       end of grouping
    --------------------------------------------------------------------------------
      )                        end of \1
    --------------------------------------------------------------------------------
      \[                       '['