Search code examples
mysqlsqlrrmysqlr-dbi

Insert NA values in Mysql table RMySQL


I am trying to insert a data frame row in a mysql table, but I have NA values in character and numeric columns. I'm getting this error: Error in .local(conn, statement, ...) : could not run statement: Unknown column 'NA' in 'field list'

This is my query:

sql <- sprintf("insert into payment (amount,payment_type,date,customer_total,base_price, p2c_total, partner_total, pay_online, pay_at_lot,tax,first_name_on_card,last_name_on_card,address)
                  values (%f, %d, '%s',%f,%f,%f,%f,%f,%f,%f,'%s','%s','%s');",
                 payments[i,]$amount,payments[i,]$payment_type,payments[i,]$date, payments[i,]$customer_total,
                 payments[i,]$base_price, payments[i,]$p2c_total, payments[i,]$partner_total,
                 payments[i,]$pay_online,payments[i,]$pay_at_lot,payments[i,]$tax,
                 payments[i,]$first_name_on_card, payments[i,]$last_name_on_card, payments[i,]$address)
rs <- dbSendQuery(db, sql[i])
dbClearResult(rs)
    

This is the sql code:

insert into reservation (reservation_number, driver_name, number_passengers, checkin_datetime, checkout_datetime, days, reservation_date, reservation_email,id_reservation_status, id_payment, id_ship, id_facility, id_user) values ('j990j','CB', 4, '2020-01-12 10:00:00', '2020-01-19 10:30:00', 8, 'NA', '[email protected]',NA, 1, 2, 547, 6);

And this is the mysql error: #1054 - La columna 'NA' en field list es desconocida

MySQL version: 8.0.27

R version: 4.03

RMySQL package: 0.10.22


Solution

  • Three ways to look at this:

    1. Don't sprintf/paste data into a query string. In addition to security concerns about malicious SQL injection (e.g., XKCD's Exploits of a Mom aka "Little Bobby Tables"), it is also a concern for malformed strings or Unicode-vs-ANSI mistakes, even if it's one data analyst running the query.

      Conveniently, there's a function that takes care of inserting data from a data.frame into the table in a safer way: dbAppendTable. You might be able to do just

      dbAppendTable(db, "payment", payments[i,])
      

      if all of the columns need to be inserted, otherwise something more verbose is necessary:

      dbAppendTable(db, "payment", payments[i,c("amount", "payment_type", "date", "customer_total", "base_price", "p2c_total", "partner_total", "pay_online", "pay_at_lot", "tax", "first_name_on_card", "last_name_on_card", "address")])
      

      If you're planning on doing this for more than 1 row, then dbAppendTable can take multiple rows with no issue.

    2. If you really want to do one row at a time with your own insert statement, then I strongly urge you to use parameterized queries, perhaps something like:

      qry <- "insert into payment (amount,payment_type,date,customer_total,base_price, p2c_total, partner_total, pay_online, pay_at_lot,tax,first_name_on_card,last_name_on_card,address)
                        values (?, ?, ?,?,?,?,?,?,?,?,?,?,?);"
      dbExecute(db, qry, params = payments[i, c("amount", "payment_type", ...)])
      

      (That reminds me ... dbExecute is a nice wrapper that does dbSendStatement always followed by dbClearResult. There's also dbGetQuery which is really dbSendQuery always followed by dbClearResult, returning the data. You aren't returning rows from the table, so the first is preferred anyway.)

      NOTE: this feature requires an up-to-date driver for accessing the database. If you're using RMySQL then there is a problem: that package has not seen substantive updates in years (as of now) and does not support parameterized queries. I believe the RMariaDB package is both fully compatible with MySQL and it supports parameterized queries.

    3. If you must really do this manually (and really, I discourage it strongly, too many times I thought I could work around the risks, only to be bitten each time), then R's NA translates into null (no quotes!). To do this, you need to conditionally add quotes. Something like:

      ifelse(is.na(payments[i,]$date), "null", sQuote(payments[i,]$date))
      

      for each string-like field in your query, and make sure to change '%s' to %s in your format. There are almost certainly better ways to automate this so that you aren't typing a dozen or more ifelses, but in my opinion it really is not worth the pain of doing that.

      (If you're relying on different semantics of sprintf("%s", ..) versus the implicit string-ification with sQuote, then you may need even more elbow-grease there.)