I have a dataframe containing tweets. I'm working to delete the stop words and for this reason I used:
stopWords <- stopwords("en")
tweets_sample$text<-removeWords(tweets_sample$text,stopWords)
Anyway, I obtained
Error in gsub(sprintf("(*UCP)\\b(%s)\\b", paste(sort(words, decreasing = TRUE), :
input string 1 is invalid UTF-8
What would account for that kind of error?
Looks like an encoding issue. Try Encoding(tweets_sample$text) <- "UTF-8"
and then the removeWords
function.