Search code examples
rrodbc

Data from ODBC blob not matching return from SQL query


I’m reading a BLOB field from an ODBC data connection (the BLOB field is a file). I connect and query the database, returning the blob and the filename. The blob itself does not contain the same data as I find in the database however. My code is as follows along with the data returned vs in the DB.

library(RODBC)

sqlret<-odbcConnect('ODBCConnection')
qry<-'select content,Filename from document with(nolock) where documentid = \'xxxx\'' 
df<-sqlQuery(sqlret,qry)
close(sqlret)

rootpath<-paste0(getwd(),'/DocTest/')

dir.create(rootpath,showWarnings = FALSE)

content<-unlist(df$content)
fileout<-file(paste0(rootpath,df$Filename),"w+b")
writeBin(content, fileout)
close(fileout)

database blob is

0x50726F642050434E203A0D0A35363937313533320D0A33383335323133320D0A42463643453335380D0A0D0A574C4944203A0D0A0D0…

the dataframe’s content is

00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004b020000000000000d0000f1000000008807840200000000d0f60c0c0000…

The filenames match up, as does the size of the content/blob.


Solution

  • The exact approach you take may vary depending on your ODBC driver. I'll demonstrate how I do this on MS SQL Server, and hopefully you can adapt it to your needs.

    I'm going to use a table in my database called InsertFile with the following definition:

    CREATE TABLE [dbo].[InsertFile](
      [OID] [int] IDENTITY(1,1) NOT NULL,
      [filename] [varchar](50) NULL,
      [filedata] [varbinary](max) NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    
    GO
    

    Now let's create a file that we will push into the database.

    file <- "hello_world.txt"
    write("Hello world", file)
    

    I need to do a little work to prep the byte code for this file to go into SQL. I use this function for that.

    prep_file_for_sql <- function(filename){
      bytes <- 
        mapply(FUN = readBin,
               con = filename,
               what = "raw",
               n = file.info(filename)[["size"]],
               SIMPLIFY = FALSE) 
      chars <- 
        lapply(X = bytes,
               FUN = as.character)
      vapply(X = bytes,
             FUN = paste,
             collapse = "",
             FUN.VALUE = character(1))
    
    }
    

    Now, this is a bit strange, but the SQL Server ODBC driver is pretty good at writing VARBINARY columns, but terrible at reading them.

    Coincidentally, the SQL Server Native Client 11.0 ODBC driver is terrible at writing VARBINARY columns, but okay-ish with reading them.

    So I'm going to have two RODBC objects, conn_write and conn_read.

    conn_write <- 
      RODBC::odbcDriverConnect(
        paste0("driver=SQL Server; server=[server_name]; database=[database_name];",
               "uid=[user_name]; pwd=[password]")
      )
    
    conn_read <- 
      RODBC::odbcDriverConnect(
        paste0("driver=SQL Server Native Client 11.0; server=[server_name]; database=[database_name];",
               "uid=[user_name]; pwd=[password]")
      )
    

    Now I'm going to insert the text file into the database using a parameterized query.

    sqlExecute(
      channel = conn_write,
      query = "INSERT INTO dbo.InsertFile (filename, filedata) VALUES (?, ?)",
      data = list(file,
                  prep_file_for_sql(file)),
      fetch = FALSE
    )
    

    And now to read it back out using a parameterized query. The unpleasant trick to use here is recasting your VARBINARY property as a VARBINARY (don't ask me why, but it works).

    X <- sqlExecute(
      channel = conn_read,
      query = paste0("SELECT OID, filename, ",
                     "CAST(filedata AS VARBINARY(8000)) AS filedata ",
                     "FROM dbo.InsertFile WHERE filename = ?"),
      data = list("hello_world.txt"),
      fetch = TRUE,
      stringsAsFactors = FALSE
    )
    

    Now you can look at the contents with

    unlist(X$filedata)
    

    And write the file with

    writeBin(unlist(X$filedata),
             con = "hello_world2.txt")
    

    BIG DANGEROUS CAVEAT

    You need to be aware of the size of your files. I usually store files as a VARBINARY(MAX), and SQL Server isn't very friendly about exporting those through ODBC (I'm not sure about other SQL Engines; see RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX) for more details)

    The only way I've found to get around this is to recast the VARBINARY(MAX) as a VARBINARY(8000). That obviously is a terrible solution if you have more than 8000 bytes in your file. When I need to get around this, I've had to loop over the VARBINARY(MAX) column and created multiple new columns each of length 8000, and then paste them all together in R. (check out: Reconstitute PNG file stored as RAW in SQL Database)

    As of yet, I've not come up with a generalized solution to this problem. Perhaps that's something I should spend more time on, though.