Search code examples
rjsonrtweet

Why is Rtweet's parse_stream() function returning a NULL object?


I have a series of .json files each containing data captured from between 500 and 10,000 tweets (3-40 MB each). I am trying to use rtweet's parse_stream() function to read these files into R and store the tweet data in a data table. I have tried the following:

tweets <- parse_stream(path = "india1_2019090713.json")

There is no error message and the command creates a tweets object, but it is empty (NULL). I have tried this with other .json files, and the result is the same. Has anyone encountered this behaviour/is there something obvious I am doing wrong? I would appreciate any advice to an rtweet newbie!

I am using rtweet version 0.6.9.

Many thanks!


Solution

  • As an update and partial answer: I've not made progress with the original issue, but I have had a lot more success using the jsonlite package, which is amply able to read in large and complex .json files containing Tweet data.

    library(jsonlite)
    

    I used the fromJSON() function as detailed here. I found I needed to edit the original .json file to match the required structure, beginning and ending the file with square brackets ([ ]) and adding a comma before each line break at the end of each Tweet. Then:

    tweetsdf <- fromJSON("india1_2019090713.json", simplifyDataFrame = TRUE, flatten = TRUE)
    

    simplifyDataFrame ensures the contents are saved as a data frame with one row per Tweet, and flatten collapses most of the nested Tweet attributes to separate columns for each sub-value rather than generating columns full of unwieldy list structures.