I am running a script in R and then need its results to be sent to SQL Server.
During the first run I used this code:
sqlSave(conn, df, tablename = "myschema.result_table", rownames=FALSE,
append=FALSE, varTypes=varTypes)
So the table was created and everything worked fine. Now I would like to update the results in a following way:
Id
exists in result_table
, update the contents.Id
doesn't exist in result_table
, append the row with this Id
.The problem is, if I use:
sqlUpdate(conn, df, tablename = "myschema.result_table", index = "Id", fast = FALSE)
I get an error as soon as the script finds an Id
not existing in result_table
. When using sqlSave
with append=TRUE
, everything gets appended without further checks and I get double rows.
Is there any solution to this problem or do I have to create a new result table each time and later merge these tables in SQL?
There's a few simple ways to handle this:
1) Bring your existing IDs into R, extract existing IDs from your final dataframe, insert & update based on the two dataframes. Here's an example of how to update a table with a dataframe in R:
How to update rows in Database with values from data.frame in R conditionally
2) Dump all results into a table in SQL, then run a stored procedure that updates data for existing IDs, inserts data for non-existing IDs, then clears out said table.
UPDATE t
SET t.Col1 = r.Col1
FROM table t
JOIN tablefromr r --Existing
ON t.ID = r.ID
INSERT INTO table
SELECT r.*
FROM tablefromr r
LEFT JOIN table t
ON r.ID = t.ID
WHERE t.ID IS NULL --Not Existing
TRUNCATE TABLE tablefromr
3) Dump your results into a temp table in SQL (global, so you can access in R) - then do the same steps as #2.
Not sure if this is a one-off, or something you'll be doing constantly. The right answer depends on that. Hope I helped!