RODBC is the main library in R to import data from a database into R. RODBC
seems to have the ability of "guess" the datatype of the column which I find it particularly annoying.
I have uploaded a file test.xls
here, or you may create a xls file yourself:
col_a
and the second column named col_b
.col_a
, I typed letters on this column for 92 rows'
)library(RODBC)
setwd("C:/Users/hke775/Documents/Enoch/MISC/R_problems/RODBC")
channel <- odbcConnectExcel("test.xls",readOnly=TRUE)
dummy.df <- sqlFetch(channel,"Sheet1")
odbcClose(channel)
You will see that in dummy.df
, col_b
is all NA
, the 1923
in this column is gone.
If you want to see the 1923
again, you can change the 1st row of col_b
to a number, and it is back again.
This is very annoying as I don't prefer modifying data manually. I need to use other package to do the xls importing, but I can't find other packages do as smooth as RODBC
(I tried gdata
and xlsReadWrite
).
Did I missing anything in the sqlFetch
command, and cause the trouble? Thanks.
Please don't blame R or RODBC for Microsoft's bugs... ;)
However, due to a bug in the ODBC driver, specifying the Rows to Scan (MaxScanRows) setting currently has no effect. In other words, the Excel ODBC driver (MDAC 2.1 and later) always scans the first 8 rows in the specified data source in order to determine each column's datatype.
For additional information about the Rows to Scan bug, including a simple workaround, click the article number below to view the article in the Microsoft Knowledge Base:
189897 XL97: Data Truncated to 255 Characters with Excel ODBC Driver
I tried the fix in KB189897 by setting the TypeGuessRows
value to 0 and look what happens!
> library(RODBC)
> channel <- odbcConnectExcel("test.xls",readOnly=TRUE)
> tail(dummy.df <- sqlFetch(channel,"Sheet1"))
col_a col_b
87 c NA
88 d NA
89 e NA
90 f NA
91 g NA
92 h 1923
> odbcClose(channel)
Please, no up-votes or check marks... just send cash. :)