The question is pretty simple. I'm trying to replace "\U"
throughout a vector of strings, and for this I'm using the package {stringr}
, but I'm having issues matching the pattern.
text <- "\U0001f517"
stringr::str_detect(text, "\U")
#> Error: '\U' used without hex digits in character string starting ""\U"
stringr::str_detect(text, "\\U")
#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) :
#> Unrecognized backslash escape sequence in pattern. (U_REGEX_BAD_ESCAPE_SEQUENCE, context=`\U`)
stringr::str_detect(text, "\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\U"
stringr::str_detect(text, "\\\\U")
#> FALSE
stringr::str_detect(text, "\\\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\\\U"
stringr::str_detect(text, "\\\\\\U")
#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) :
#> Unrecognized backslash escape sequence in pattern. (U_REGEX_BAD_ESCAPE_SEQUENCE, context=`\\\U`)
stringr::str_detect(text, "\\\\\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\\\\\U"
# ... you get the idea
As far as I can tell, this issue is because the regex engine sees "\U"
as indicating the beginning of a new hex code, as indicated by the first error. Other characters work fine:
text <- "\a0001f517"
stringr::str_detect(text, "\a")
#> TRUE
I've seen other questions around this issue, e.g. here, but still can't get this to work. Can anyone give me a working regex for this?
\U
in your text <- "\U0001f517"
is not a separate char sequence, it is part of the Unicode character code point notation. The literal text in the text
variable is in fact 🔗
, you can easily check that using cat(text)
.
On the contrary, "\a"
is a single character (a "Bell" character) that can also be written as "\u0007"
or "\x07"
(run "\a" == '\x07'
and you will see that the output is TRUE
). See more about string escape sequences syntax.
In R, to get the underlying string literal as a literal string, you can use
text <- "\U0001f517"
cat(text)
## => 🔗
library("utf8")
text <- utf8_encode(text)
cat(text)
## => \U0001f517