Search code examples
sqlrdelete-rowrodbc

How can I delete records from an ODBC database based on a dataframe in R


I have a dataframe in R that has a list of records. I would like to delete all of the matching records from my ODBC datasource.

    data_to_delete ##This is my dataframe

    delete from RODBC datasource where record_id IN (select record_id 
                                                    from data_to_delete)

Any ideas on how to achieve this? I am currently using sqlQuery to pull data down and sqlSave to create new records, but I'm not sure how to delete records.


Solution

  • As the answers to this related question explain, you will need to build up the string that holds theDELETE statement programmatically, and then you can execute a DELETE statement just as any other statement within sqlQuery.

    So at the end of your DELETE, don't try to "select" from the R variable, rather just dump its contents into a comma-separated string. Below, I assume that whatever data_to_delete is, we can use paste with as.character to spit it into a comma-separated string. But you can modify that as needed.

    I also assume that you mean for datasource to refer to the database, and I use some_table_name as a placeholder for the table's name within that database. You should adjust the values for your specific case.

    del_query <- sprintf("DELETE from some_table_name
                          where record_id IN (%s)",
                         paste(as.character(data_to_delete), 
                               collapse=", ")
                 )
    
    sqlQuery(datasource, del_query)
    

    Using this method, you need to be careful if the content of your data is large. There is often a symbol limit for the number of distinct symbols that can appear in the IN clause -- something like 2000 symbols as default in SQL Server.

    If you need more values than that, you could first issue a query that writes the values into a temp table wholly on the database, and then issue a query with sqlQuery that contains the extra IN ( SELECT * FROM ... ) but you'll be querying from that temp table, not "from" the R variable.