Search code examples
rstringreplaceascii

Remove ASCII control characters from string


I have a df that contains a column with string values in it. Some of these strings are a combinations of characters and dates, some characters and numbers. There are some instances where the string will have punctuation, such as a "()" or "#". Having these types of characters in the string is perfectly fine. This df is ultimately written to an excel file.

The issue I've run into is that the "STX" ASCII control character is embedded into one of the strings and I can't seem to get it removed, which causes issues when opening the excel file after the data has been written to it. Here's an example of what that string may look like:

'Value 1/2: This, That, Third, Random STX Value, Random 2, Value 6'

I've tried doing the following but no luck on either:

str_replace_all(df$col, "[[:punct:]]", "")
iconv(df$col, "ASCII", 'UTF-8', sub = "")

Does anyone know how I can get this removed?


Solution

  • You say you want to remove all occurrences of STX character from your strings.

    You can do it with a simple gsub command (all it does is searching for the pattern or a fixed string (depends on the fixed argument value) and replaces with a replacement pattern or another fixed string:

    df$col = gsub("\x02", "", df$col, fixed=TRUE)
    

    What is \x02? It is a string escape sequence where \x signals the the construct start and the next two chars are interpreted as a hexadecimal number.

    The fixed=TRUE argument tells the R engine to search for the STX character as a literal char, not as a regex pattern, which usually results in better performance and avoids other regex-related issues when all you need is to replace a literal text with another literal text.