Search code examples
mysqlrescapingdouble-quotesrmysql

Double Quotes in temporary JSON variable on MySQL using R


I have a table in MYSQL that contains the user interactions with a Web Page, I needed to extract the rows for the users where the date of that interaction is lower than a certain benchmark date and that benchmark date is different for each customer (I extract that date from a different database).

My approach was to set a json variable in which the key is a user and the value is the benchmark date, and used it in the query to extract the intended fields.

Example in R:

#MainDF contains the user and the benchmark date from a different database

json_str <- mapply(function(uid, bench_date){
                      paste0(
                            '{','"',cust,'"', ':', '"', bench_date, '"','}'
                             ) 
                                            }, MainDF[, 'uid'], 
                                               MainDF[, 'date'] 
                   )

json_str <- paste0("'", '[', paste0(json_str , collapse = ','), ']', "'")

temp_var <- paste('set @test=', json_str)

The intention was to make temp_var to be like:

set @test= '{"0001":"2010-05-05",
             "0012":"2015-05-05",
             "0101":"2018-07-20"}'

but it actually looks like :

set @test= '{\"0001\":\"2010-05-05\",
             \"0012\":\"2015-05-05\",
             \"0101\":\"2018-07-20\"}'

then create the main query:

main_Q <- "select user_id, date
           from interaction
           where 1=1
          and json_contains(json_keys(@test), concat('\"',user_id,'\"')) = 1
          and date <= json_unquote(json_extract(@test,  
                                             concat('$.','\"',user_id, '\"') 
                                                )
                                   )
          "             

For the execution, first, set the temporal variable and then execute the main query

dbSendQuery(connection, temp_var)
resp <- dbSendQuery(connection, main_Q )
target_df <- fetch(resp, n=-1)
dbClearResult(resp )

When I test a fraction of it in a SQL IDE it does works. However, in R it doesn't return anything. I think that the issue is that R escape the double quotes in temp_var and SQL end up reading

set @test= '{\"0001\":\"2010-05-05\",
             \"0012\":\"2015-05-05\",
             \"0101\":\"2018-07-20\"}'

which is not won't work. For example if I execute:

set @test= '{"0001":"2010-05-05",
             "0012":"2015-05-05",
             "0101":"2018-07-20"}'

select json_keys(@test)

it will return an array with the keys, but that is not the case with

set @test= '{\"0001\":\"2010-05-05\",
             \"0012\":\"2015-05-05\",
             \"0101\":\"2018-07-20\"}'

select json_keys(@test)

I am not sure how to solve the issue, but I need double quotes to specify the JSON. Is there any other approach that I should try or a way to make this work?


Solution

  • First, I think it is generally better to use a well-known library/package for converting to/from JSON, for several reasons.

    This gives you a string that you should be able to place just about anywhere.

    json_str <- jsonlite::toJSON(setNames(as.list(MainDF$date), MainDF$uid), auto_unbox=TRUE)
    json_str
    # {"0001":"2010-05-05","0012":"2015-05-05","0101":"2018-07-20"} 
    

    And while looking at the object on the R console will give the escaped-doublequotes,

    as.character(json_str)
    # [1] "{\"0001\":\"2010-05-05\",\"0012\":\"2015-05-05\",\"0101\":\"2018-07-20\"}"
    

    that is merely R's representation (shows all strings within double-quotes, and therefore needs to escape any double-quotes within the string).

    Adding it into some script should be straight-forward:

    cat(paste('set @test=', sQuote(json_str)), '\n')
    # set @test= '{"0001":"2010-05-05","0012":"2015-05-05","0101":"2018-07-20"}' 
    

    I'm assuming that having each on its own row is not critical. If it is, and indentation is important, perhaps this is more your style:

    spaces <- strrep(' ', 2+nchar('set @test = '))
    cat(paste0('set @test = ', sQuote(gsub(",", paste0(",\n", spaces), json_str))), '\n')
    # set @test = '{"0001":"2010-05-05",
    #               "0012":"2015-05-05",
    #               "0101":"2018-07-20"}' 
    

    Data:

    MainDF <- read.csv(stringsAsFactors=FALSE, colClasses='character', text='
    uid,date
    0001,2010-05-05
    0012,2015-05-05
    0101,2018-07-20')