So I have this multibyte string "UCA1\xa6\xc1" within a large vector of RNA names, which yields UCA1�� upon using the cat() function. I am trying to screen the vector for such strings and rename them to something else or if all else fails, remove them from the vector, as I cannot capitalize such strings with functions like toupper().
I'm not too sure of the data type that '\xa6' and '\xc1' encodes so I am unsure of how to screen for them using any form of regex. Could anybody help me with this?
This is probably an encoding issue, so try change the encoding during load! Try something like this,
df<- read.csv(file_path,
encoding = "iso-8859-1", "use different encodings/langs"
header = TRUE,
stringsAsFactors = FALSE)