Search code examples
regexstringnlpjuliaescaping

re.escape() equivalent in Julia?


I have a bunch of abbreviations I'd like to use in RegEx matches, but they contain lots of regex reserved characters (like . ? $). In Python you're able to return an escaped (regex safe) string using re.escape. For example:

re.escape("Are U.S. Pythons worth any $$$?") will return 'Are\\ U\\.S\\.\\ Pythons\\ worth\\ any\\ \\$\\$\\$\\?'

From my (little) experience with Julia so far, I can tell there's probably a much more straightforward way of doing this in Julia, by I couldn't find any previous answers on the topic.


Solution

  • Julia uses the PCRE2 library underneath, and uses its regex-quoting syntax to automatically escape special characters when you join a Regex with a normal String. For eg.

    julia> r"\w+\s*" * raw"Are U.S. Pythons worth any $$$?"
    r"(?:\w+\s*)\QAre U.S. Pythons worth any $$$?\E"
    

    Here we've used a raw string to make sure that none of the characters are interpreted as special, including the $s.

    If we needed interpolation, we can also use a normal String literal instead. In this case, the interpolation will be done, and then the quoting with \Q ... \E.

    julia> snake = "Python"
    "Python"
    
    julia> r"\w+\s*" * "Are U.S. $snake worth any money?"
    r"(?:\w+\s*)\QAre U.S. Python worth any money?\E"
    

    So you can place the part of the regex you wish to be quoted in a normal String, and they'll be quoted automatically when you join them up with a Regex.

    You can even do it directly within the regex yourself - \Q starts a region where none of the regex-special characters are interpreted as special, and \E ends that region. Everything within such a region is treated literally by the regex engine.