I need to read .jsonl files in to R, and it's going very slowly. For a file that's 67,000 lines, it took over 10 minutes to load. Here's my code:
library(dplyr)
library(tidyr)
library(rjson)
f<-data.frame(Reduce(rbind, lapply(readLines("filename.jsonl"),fromJSON)))
f2<-f%>%
unnest(cols = names(f))
Here's a sample of the .jsonl file
{"UID": "a1", "str1": "Who should win?", "str2": "Who should we win?", "length1": 3, "length2": 4, "prob1": -110.5, "prob2": -108.7}
{"UID": "a2", "str1": "What had she walked through?", "str2": "What had it walked through?", "length1": 5, "length2": 5, "prob1": -154.6, "prob2": -154.8}
So my questions are: (1) Why is this taking so long to run, and (2) How do I fix it?
I think the most efficient way to read in json lines files is to use the stream_in()
function from the jsonlite package. stream_in()
requires a connection
as input, but you can just use the following function to read in a normal text file:
read_json_lines <- function(file){
con <- file(file, open = "r")
on.exit(close(con))
jsonlite::stream_in(con, verbose = FALSE)
}