I have a series of .json files each containing data captured from between 500 and 10,000 tweets (3-40 MB each). I am trying to use rtweet's parse_stream()
function to read these files into R and store the tweet data in a data table. I have tried the following:
tweets <- parse_stream(path = "india1_2019090713.json")
There is no error message and the command creates a tweets
object, but it is empty (NULL). I have tried this with other .json files, and the result is the same. Has anyone encountered this behaviour/is there something obvious I am doing wrong? I would appreciate any advice to an rtweet newbie!
I am using rtweet version 0.6.9.
Many thanks!
As an update and partial answer: I've not made progress with the original issue, but I have had a lot more success using the jsonlite package, which is amply able to read in large and complex .json files containing Tweet data.
library(jsonlite)
I used the fromJSON()
function as detailed here. I found I needed to edit the original .json file to match the required structure, beginning and ending the file with square brackets ([ ]) and adding a comma before each line break at the end of each Tweet. Then:
tweetsdf <- fromJSON("india1_2019090713.json", simplifyDataFrame = TRUE, flatten = TRUE)
simplifyDataFrame
ensures the contents are saved as a data frame with one row per Tweet, and flatten
collapses most of the nested Tweet attributes to separate columns for each sub-value rather than generating columns full of unwieldy list structures.