Search code examples
rsql-serverodbcr-markdownrnotebook

Push an R dataframe to SQL as a new table


I'm trying to make my analysis workflow more efficient connecting my SQL Server Database to R-Studio.

What I do now?

  1. I develop my queries using MS SQL Management Studio to extract relevant information for the analysis.
  2. I export the query result to a .csv UTF-8 file.
  3. I read the file in R-Studio and perform analyses.
  4. I export the final dataframe to a .csv UTF-8 file.
  5. I use this file to create a table back in SQL Management Studio.

What am I trying to do?

I want to create a connection between SQL Server and R-Studio so I can directly import/export my dataframe/table without having these files in my computer.

To do that I have followed these steps:

  1. Installed DBI and odbc packages in R.
  2. Configured the connection using "ODBC Data Source Connection Administrator"
    • My connection name is "Enterprise" (SQL Server Native Client 10.0)
  3. In R-Studio, using the Connection tab, I can seen a list of data sources containing "Enterprise".
    • Choosing "Enterprise" data source create an R object with the name "con" that contains the connection information.
    • At this point I can see all databases in the connection tab.
  4. My code for testing in R is the following:
---
title: "R Notebook"
output: html_notebook
---
```{r setup, include=FALSE}
library(odbc)
con <- dbConnect(odbc::odbc(), "Enterprise", timeout = 10) ```

```{sql, connection = con, output.var = "DataFrame"}
SELECT TOP 1000 *
FROM [DB].[dbo].[Table] ```

Data<-DataFrame[1:100,1:10]
head(Data,11)```

Var1    Var2    Var3    Var4    Var5    Var6    Var7    Var8    Var9    Var10
   A       1    2.4      5.5    13.1    30.8    72.4    170.5   401.6   945.7
   B       2    4.7     11.1    26.1    61.5    144.8   341.1   803.2   1891.5
   C       3    7.1     16.6    39.2    92.3    217.3   511.6   1204.8  2837.2
   D       4    9.4     22.2    52.2    123.0   289.7   682.2   1606.4  3782.9
   E       5    11.8    27.7    65.3    153.8   362.1   852.7   2008.0  4728.6
   F       6    14.1    33.3    78.4    184.5   434.5   1023.2  2409.6  5674.4
   G       7    16.5    38.8    91.4    215.3   506.9   1193.8  2811.2  6620.1
   H       8    18.8    44.4    104.5   246.0   579.4   1364.3  3212.8  7565.8
   I       9    21.2    49.9    117.5   276.8   651.8   1534.9  3614.4  8511.5
   J       10   23.5    55.5    130.6   307.5   724.2   1705.4  4016.0  9457.3
   K       11   25.9    61.0    143.6   338.3   796.6   1875.9  4417.6  10403.0
```{sql, connection = con}
SELECT * INTO [DB].[dbo].[New_Table]
FROM ?Data```

Here I'm using the ? to indicate that "Data" is an R dataframe object in the SQL Chunk.

Error in vapply(values, function(x) { : values must be length 1,
 but FUN(X[[1]]) result is length 100
Failed to execute SQL chunk

My guess is I'm doing something wrong in the last chunk.

What will be the proper way to push "Data" to my data base?


Solution

  • How I solved it?

    I was on the right track until I got to the part of pushing my dataframe to the DB. @r2evans pointed out that I was trying to create a table from the server with information that only existed locally with this code:

    ```{sql, connection = con}
    SELECT * INTO [DB].[dbo].[New_Table]
    FROM ?Data```
    

    At this point, I changed the method and tried to pushed the dataframe using r-code:

    ```{r}
    dbWriteTable(con, "[DB].[dbo].[New_Table]", Data)```
    

    The result was the following error:

    nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][SQL Server Native Client 10.0][SQL Server]CREATE TABLE permission denied in database 'master'.

    The problem here: I was trying to access to the database I have the permission using "[DB].[dbo].[New_Table]". The dbWriteTable function understand this parameter just as a pure name for the table, so I had to find a way to specify the right path to the database.

    I did not find the way to do it using code directly. Solution: I went to the "ODBC Data Sources (32-bit)" (Windows APP); and changed the configuration of my connection. The connection I had, was taking the master as the default database. I changed it for the database I have the permission. So now I'm using the following code to push the table from R-notebooks:

    ```{r}
    dbWriteTable(con, "[R_Test2]", Data, overwrite = T)```
    

    It's working perfectly for what I need.

    If someone know how to specify this path using code, will be good to know. Maybe in the first configuration chunk:

    ```{r setup, include=FALSE}
    library(odbc)
    library(DBI)
    con <- dbConnect(odbc::odbc(), "Enterprise", timeout = 10)```