When converting json to columns in R using jsonlite, I came across a bug where the column values were in the wrong columns. I tried replicating it -
R> dt <- data.table(id = 1:2,
json = c('{"user": "xyz2", "weightmap": "w1", "domains": "d1"}',
'{"weightmap": "w2", "user": "abcd", "domains": "d2"}'))
R> dt[,fromJSON(json),.(id)]
id user weightmap domains
1: 1 xyz2 w1 d1
2: 2 w2 abcd d2
From what I understand, jsonlite picks the keys from the first row and expects the json in all rows to be in the same order.
I also tried this with different datatypes. Eg. If weightmap
was a complex datastructure like a list or another nested json, it would fail saying the columns types are not consistent.
Is this expected behavior ? Should'nt it always read from the keys and split to cols ?
Is there a way to make jsonlite parse the json correctly ?
I wonder if this is a "bug" in data.table
, where it is internally treating each as a list
instead of a data.frame
with names that need to be aligned. We can force it this way:
dt[, c(.SD, rbindlist(lapply(json, jsonlite::fromJSON), use.names=TRUE))][, json := NULL][]
# id user weightmap domains
# <int> <char> <char> <char>
# 1: 1 xyz2 w1 d1
# 2: 2 abcd w2 d2
If there is any complexity, what happens depends on the structure. For example, if a weightmap
is a list, then
dt <- data.table(id = 1:2,
json = c('{"user": "xyz2", "weightmap": ["w1","w2"], "domains": "d1"}',
'{"weightmap": "w2", "user": "abcd", "domains": "d2"}'))
dt[, c(.SD, rbindlist(lapply(json, jsonlite::fromJSON), use.names=TRUE))][, json := NULL][]
# Warning in as.data.table.list(jval, .named = NULL) :
# Item 1 has 2 rows but longest item has 3; recycled with remainder.
# Warning in as.data.table.list(jval, .named = NULL) :
# Item 2 has 2 rows but longest item has 3; recycled with remainder.
# id user weightmap domains
# <int> <char> <char> <char>
# 1: 1 xyz2 w1 d1
# 2: 2 xyz2 w2 d1
# 3: 1 abcd w2 d2
which achieves an "unnesting" (noisily).