Search code examples
rdatabasedbi

Adding new variables to a table in Azure from R?


I am storing data in Azure and plan to do a lot of the updates (adding, removing and changing records) from R. This is because I've already written extensive code in R and don't currently have the time to re-write it all in SQL.

This approach is mostly fine, as I can use the dbReadTable(), dbWriteTable() and dbAppendTable() functions from the DBI package to push and pull data. However, I can't find a way to add variables to an Azure table from R. I can do it from Microsoft SQL Server, but ideally would want to do it from R.

I've tried using dbWriteTable() to add the columns but it attempts to write a new table. When I use dbAppendTable(), I can easily add rows, but when I try to add data with the columns I want added, I get an error.

Any suggestions are helpful!


Solution

  • You have three choices that I can think of in this case, as you've correctly identified that dbAppendTable is used for adding rows.

    1. As I suggested in the comments, you can use dbWriteTable to write the entire table with all columns that will be eventually populated into the database and then perform updates on those columns. This is actually a best practice, and I'd suggest doing this if you know the names of the columns you want to eventually end up with in the final SQL Server table.
    2. You can either add additional columns to a dataframe/tibble in R, perform all updates to those columns in the R dataframe itself, and then use dbWriteTable to write your finalized table with all columns to the database (this is the same thing as 1 above, only you're performing updates on the dataframe first and then writing the final dataframe to SQL Server. In 1 above, you're writing the final table to the database with empty columns and then performing updates to the SQL Server table to populate the empty columns as opposed to an R dataframe).
    3. You can use DBI's dbExecute function to alter the table and add columns in your SQL database (and optionally update values in the newly added columns with an update statement).

    For option 3 above, you would need to execute some code as follows, to add, say a varchar(20) column called new_col to the table named MY_TABLE:

    dbExecute(
      con,
      "ALTER TABLE dbo.MY_TABLE ADD mew_col VARCHAR(20) NULL"
    )