Search code examples
jsonrparsingnestedstring-parsing

parsing nested structures in R


I have a json-like string that represents a nested structure. it is not a real json in that the names and values are not quoted. I want to parse it to a nested structure, e.g. list of lists.

#example:
x_string = "{a=1, b=2, c=[1,2,3], d={e=something}}"

and the result should be like this:

x_list = list(a=1,b=2,c=c(1,2,3),d=list(e="something"))

is there any convenient function that I don't know that does this kind of parsing?

Thanks.


Solution

  • If all of your data is consistent, there is a simple solution involving regex and jsonlite package. The code is:

    if(!require(jsonlite, quiet=TRUE)){ 
        #if library is not installed: installs it and loads it into the R session for use.
    
        install.packages("jsonlite",repos="https://ftp.heanet.ie/mirrors/cran.r-project.org")
        library(jsonlite)
    }
    
    x_string = "{a=1, b=2, c=[1,2,3], d={e=something}}"
    
    json_x_string = "{\"a\":1, \"b\":2, \"c\":[1,2,3], \"d\":{\"e\":\"something\"}}"
    fromJSON(json_x_string)
    
    s <- gsub( "([A-Za-z]+)", "\"\\1\"",  gsub( "([A-Za-z]*)=", "\\1:", x_string ) )
    
    fromJSON( s )
    

    The first section checks if the package is installed. If it is it loads it, otherwise it installs it and then loads it. I usually include this in any R code I'm writing to make it simpler to transfer between pcs/people.

    Your string is x_string, we want it to look like json_x_string which gives the desired output when we call fromJSON().

    The regex is split into two parts because it's been a while - I'm pretty sure this could be made more elegant. Then again, this depends on if your data is consistent so I'll leave it like this for now. First it changes "=" to ":", then it adds quotation marks around all groups of letters. Calling fromJSON(s) gives the output:

    fromJSON(s)

    $a

    [1] 1

    $b

    [1] 2

    $c

    [1] 1 2 3

    $d

    $d$e

    [1] "something"