Search code examples
rtext-miningtmtwitter-r

Getting data from twitter with R?


I am using R 3.1.3 on Platform: x86_64-apple-darwin13.4.0 (64-bit) and tm_0.6-2 version of package

Here is my codes following:

install.packages(c("twitterR","ROAuth","RCurl","tm","wordcloud","SnowballC"))
library(SnowballC)
library(twitteR)
library(ROAuth)
library(RCurl)
library(tm)
library(wordcloud)
#twitter authentication
consumerKey <- " "
consumerSecret <- " "
accessToken <- " "
accessTokenSecret <- " "

twitteR::setup_twitter_oauth(consumerKey,consumerSecret,accessToken,accessTokenSecret)

#retrive tweets from twitter
tweets=searchTwitter("euro2016+france",lang = "en",n=500,resultType = "recent")
class(tweets)
head(tweets)
#converting list to vector
tweets_text=sapply(tweets,function(x) x$getText())
str(tweets_text)
#creates corpus from vector of tweets
tweets_corpus=Corpus(VectorSource(tweets_text))
inspect(tweets_corpus[100])

#cleaning
tweets_clean=tm_map(tweets_corpus,removePunctuation,lazy= T)
tweets_clean=tm_map(tweets_clean,content_transformer(tolower),lazy = T)
tweets_clean=tm_map(tweets_clean,removeWords,stopwords("english"),lazy = T)
tweets_clean=tm_map(tweets_clean,removeNumbers,lazy = T)
tweets_clean=tm_map(tweets_clean,stripWhitespace,lazy = T)
tweets_clean=tm_map(tweets_clean,removeWords,c("euro2016","france"),lazy = T)
#wordcloud play with parameters
wordcloud(tweets_clean)

When I run the final line, I got:

Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "try-error" In addition: Warning messages: 1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) : all scheduled cores encountered errors in user code 2: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code

Does anyone know solution a for this?


Solution

  • Somehow there seems to be a encoding problem with the removeWords function when it is used together with the tm_map function (see also here).

    A work around could be using the function earlier, at the point where you load the text into the corpus:

    #converting list to vector
    tweets_text=sapply(tweets,function(x) x$getText())
    str(tweets_text)
    
    # removing words
    tweets_text<- sapply(tweets_text, function(x) removeWords(x, c("euro2016","france")))
    tweets_text<- sapply(tweets_text, function(x) removeWords(x, stopwords("english")))
    
    
    #creates corpus from vector of tweets
    tweets_corpus=Corpus(VectorSource(tweets_text))
    inspect(tweets_corpus[100])
    
    #cleaning
    tweets_clean=tm_map(tweets_corpus,removePunctuation)
    tweets_clean=tm_map(tweets_clean,content_transformer(tolower))
    #tweets_clean=tm_map(tweets_clean,removeWords,stopwords("english"))
    tweets_clean=tm_map(tweets_clean,removeNumbers,lazy = T)
    tweets_clean=tm_map(tweets_clean,stripWhitespace,lazy = T)
    #tweets_clean=tm_map(tweets_clean,removeWords,c("euro2016","france"),lazy = T)
    wordcloud(tweets_clean)