Search code examples
rtm

tm_map is error in R


This is my first time for twitter analytic.

    #Search data from Twitter
library("twitteR")
SearchData = searchTwitter("Bruno Mars", n=1000,lang = 'en')
SearchData

#Scrapping Data 
userTimeline("BrunoMars", n=100, maxID =NULL, excludeReplies = FALSE, includeRts = FALSE)

class(SearchData)
head(SearchData)

#Cleanning Data
library(NLP)
library(tm)



TweetList <- sapply(SearchData, function(x) x$getText()) 

TweetList <- (TweetList[!is.na(TweetList)])
TweetCorpus <- Corpus(VectorSource(TweetList))
TweetCorpus <-  iconv(TweetCorpus, to ="utf-8")

#change data to lower case

TweetCorpus <- tm_map(TweetCorpus,removePunctuation)
TweetCorpus <- tm_map(TweetCorpus, removeNumbers)
TweetCorpus <- tm_map(TweetCorpus, tolower)

I have got this error "Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "character" at my last 3 lines.

I have tried to fix the problem by myself by adding content_transformer before removePunctuation, removeNumbers and tolower to my code, but I still have the same error. I really have no idea. I need your suggestions and your advices. I have been fixing this issue for a few day, but it has not been solved yet.

Thanks so much Ros


Solution

  • tm_map has to be applied to a Corpus object, not a character vector. But iconv turns your TweetCorpus object from a Corpus back into a character vector.

    To fix this, switch the order of your pre-processing, so that you use iconv before you turn the tweets into a Corpus object:

    TweetList <- c("hello", "world", "Hooray", "yep")
    TweetList <-  iconv(TweetList, to ="utf-8")
    TweetCorpus <- Corpus(VectorSource(TweetList))