Search code examples
rregexsymbols

Remove symbols from character data column


Is it possible to remove symbols from a character column, in rmarkdown?

c("Between ��20,000-��29,999 --", "Between �20,000-�29,999 --")

I would like to replace � with £ (or empty string ""). I would leave it but some responses have 1 and others 2, due to this it is not allowing me to group correctly.

I have tried

stri_replace_all_regex(f$`What is the number that best describes your TOTAL household income BEFORE TAX?`, c("�")," ") 

but no success.


Solution

  • vec <- c("Between ��20,000-��29,999 --", "Between �20,000-�29,999 --")
    gsub("�+", "£", vec)
    # [1] "Between £20,000-£29,999 --" "Between £20,000-£29,999 --"
    

    The + means "one or more" of consecutive matches.

    If you must have it with stringr, then it's just a different order of arguments:

    stringr::str_replace_all(vec, "�+", "£")
    # [1] "Between £20,000-£29,999 --" "Between £20,000-£29,999 --"