Search code examples
regexswiftescapingnspredicatensregularexpression

Ignore escaped double quote characters swift


I am trying to validate a phone number using NSPredicate and regex. The only problem is when setting the regex Swift thinks that I am trying to escape part of it due to the backslashes. How can I get around this?

My code is as follows:

let phoneRegEx = "^((\(?0\d{4}\)?\s?\d{3}\s?\d{3})|(\(?0\d{3}\)?\s?\d{3}\s?\d{4})|(\(?0\d{2}\)?\s?\d{4}\s?\d{4}))(\s?\#(\d{4}|\d{3}))?$"

Solution

  • In Swift regular string literals, you need to double-escape the slashes to define literal backslashes:

    let phoneRegEx = "^((\\(?0\\d{4}\\)?\\s?\\d{3}\\s?\\d{3})|(\\(?0\\d{3}\\)?\\s?\\d{3}\\s?\\d{4})|(\\(?0\\d{2}\\)?\\s‌​?\\d{4}\\s?\\d{4}))(\\s?#(\\d{4}|\\d{3}))?$"
    

    Starting from Swift 5, you can use raw string literals and escape regex escapes with a single backslash:

    let phoneRegEx = #"^((\(?0\d{4}\)?\s?\d{3}\s?\d{3})|(\(?0\d{3}\)?\s?\d{3}\s?\d{4})|(\(?0\d{2}\)?\s‌?\d{4}\s?\d{4}))(\s?#(\d{4}|\d{3}))?$"#
    

    Please refer to the Regular Expression Metacharacters table on the ICU Regular Expressions page to see what regex escapes should be escaped this way.

    Please mind the difference between the regex escapes (in the above table) and string literal escape sequences used in the regular string literals that you may check, say, at Special Characters in String Literals:

    String literals can include the following special characters:

    • The escaped special characters \0 (null character), \\ (backslash), \t (horizontal tab), \n (line feed), \r (carriage return), \" (double quotation mark) and \' (single quotation mark)
    • An arbitrary Unicode scalar value, written as \u{n}, where n is a 1–8 digit hexadecimal number (Unicode is discussed in Unicode below)

    So, in regular string literals, "\"" is a " string written as a string literal, and you do not have to escape a double quotation mark for the regex engine, so "\"" string literal regex pattern is enough to match a " char in a string. However, "\\\"", a string literal repesenting \" literal string will also match " char, although you can already see how redundant this regex pattern is. Also, "\n" (an LF symbol) matches a newline in the same way as "\\n" does, as "\n" is a literal representation of the newline char and "\\n" is a regex escape defined in the ICU regex escape table.

    In raw string literals, \ is just a literal backslash.