Search code examples
rtwitterrtweet

How can I collect tweets from within the last seven days using rtweet package?


I have started using rtweet package and so far, I have had good results for my queries, languages and geocode parameters. However, I still do not know how can I collect twitter data from within the last 7 days.

For example in the next code chunk I want to extract some data for 7 days but I am not sure if the collected tweets will be since 2017-06-29 until 2017-06-05 or if they will be since 2017-06-22 until 2017-06-29:

Stream all tweets mentioning AMLO or lopezobrador for 7 days

stream_tweets("AMLO,lopezobrador",
          timeout = 60*60*24*7,
          file_name = "tweetsaboutAMLO.json",
          parse = FALSE)

Read in the data as a tidy tbl data frame

AMLO <- parse_stream("tweetsaboutAMLO.json")

Do you know if there are any commands in rtweet to specify the time frame to use when using the search_tweets() or stream_tweets() functions?


Solution

  • So, to answer your question about gow to write it more efficiently, you could try a for loop or a list apply. Here I show the for loop.

    First, create a list with the 4 dates you are calling.

    fechas <- seq.Date(from = as.Date("2018-06-24"), to = as.Date("2018-06-27"), by =  1)
    

    Then create an empty data.frame to store your tweets.

    df_tweets <- data.frame()
    

    Now, loop along your list and populate the empty data.frame.

    for (i in seq_along(fechas)) {
     df_temp <-  search_tweets("lang:es",
                            geocode = mexico_coord,
                            until= fechas[i],
                            n = 100)
     df_tweets <- rbind(df_tweets, df_temp)
    }
    
    summary(df_tweets)
    

    On the other hand, the following solution might be more convenient and efficient altogether:

    library(tidyverse)
    f_tweets2 <- search_tweets("lang:es",
                             geocode = mexico_coord,
                             until= "2018-06-29", ## or latest date                            
                            n = 10000)
    df_tweets2 %>% 
      group_by(as.Date(created_at)) %>%  ## Group (or set apart) the tweets by date of creation
      sample_n(100)   ## Obtain 100 random tweets for each group, in this case, for each date.