Suppose I have a string in R,
mystring = 'help me'
but with a twist: The space between 'help' and 'me' is actually a non-breaking space. Non-breaking space is stored in R as <c2 a0>, so this string can be created by
mystring = rawToChar(as.raw(as.hexmode(c('68','65','6c','70','c2','a0','6d','65'))))
Then, for example, grepl('help me', mystring) will be FALSE
how can I replace the non-breaking space with a regular space? And in general, replace any particular raw value(s) with a particular character? Ideally I will be able to make a function like
gsubRaw('mystring',as.raw(as.hexmode(c(('c2','a0'))), ' ')
This answer almost answers my question, except that I don't want to replace ALL non-ascii characters with a space, only the non breaking space.
grepRaw() also came close, because it can detect the position in the string that the raw characters occur and they can then be replaced. However, it didn't work cleanly: sometimes the position in the string that grepRaw() returned wasn't the same as the position of the non-breaking space in the string-as-plain-text, and I don't know how to replace the raw values themselves.
From comments on my answer to the other question we can do this by using the fact that the non-breaking space is \xc2\xa0
(at least in R 4.3.1 on Windows)
mystring = rawToChar(as.raw(as.hexmode(c('68','65','6c','70','c2','a0','6d','65'))))
grepl('help me', mystring)
#> [1] FALSE
tools::showNonASCII(mystring)
#> 1: help<c2><a0>me
identical('help\xc2\xa0me', mystring)
#> [1] TRUE
mynewstring = gsub('\xc2\xa0+', ' ', mystring)
grepl('help me', mynewstring)
#> [1] TRUE
tools::showNonASCII(mynewstring)
Created on 2023-07-05 with reprex v2.0.2