Search code examples
rdata.tablebinningjsonlite

Assign categories from JSON column in data table


I have a data.table that looks like dt (provided at the end), which includes columns EI and CutsLabsCV.

CutsLabsCV is a JSON of two values indicating cuts and labels that should be used for classifying my data set.

I need to create a new column EIRange, that classifies the EI column based on cuts and labels provided on CutsLabsCV.

The expected result should be:

          EI                                    CutsLabsCV EIRange
       <num>                                        <list>  <fctr>
 1: 101.9163 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 2: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 3: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 4: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 5: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 6: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 7: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 8: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 9: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120
10: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120

How can I achieve this in data.table? I need the most efficient solution since my real dataset exceeds 2M rows.

Sample dataset:

dt <- data.table(EI = c(101.91625, 122.03178865, 122.03178865,
                        122.03178865, 122.03178865, 122.03178865, 122.03178865, 122.03178865,
                        109.521980125, 109.521980125),
                 CutsLabsCV = list(c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",85,120,\"Inf\"]", "[\"<85\",\"85-120\",\">120\"]"),
                                   c("[\"-Inf\",85,120,\"Inf\"]", "[\"<85\",\"85-120\",\">120\"]")))

Solution

  • One possible solution :

    dt[,EIRange:=sapply(CutsLabsCV,\(x) jsonlite::fromJSON(x[2])[2])]
    dt
    
    #           EI                                    CutsLabsCV EIRange
    #        <num>                                        <list>  <char>
    #  1: 101.9163 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
    #  2: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
    #  3: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
    #  4: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
    #  5: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
    #  6: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
    #  7: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
    #  8: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
    #  9: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120
    # 10: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120