Search code examples
rgsubtm

How to resolve "Error in gsub" with removeWords in R


I have a dataframe containing tweets. I'm working to delete the stop words and for this reason I used:

stopWords <- stopwords("en")
tweets_sample$text<-removeWords(tweets_sample$text,stopWords)

Anyway, I obtained

Error in gsub(sprintf("(*UCP)\\b(%s)\\b", paste(sort(words, decreasing = TRUE),  : 
input string 1 is invalid UTF-8

What would account for that kind of error?


Solution

  • Looks like an encoding issue. Try Encoding(tweets_sample$text) <- "UTF-8"and then the removeWords function.