Search code examples
sqlrparallel-processingrodbc

Parallel Database calls with RODBC


I am operating on chunks of panel data with about 3*10^6 rows and 11 columns. This data is stored in a SQL database (the data I use is a subset of an even larger dataset). I want to process chunks of the data (each date seperately) without importing the entire thing all at once, but the operations I have to run on each chunk are computationally intensive, so I would like to use the foreach package.

However, I am running into some problems with running the sql call in parallel. If I do a simple call such as (sqlQuery(channel, "select 1")) I can run that in parallel. If I do a more complex statement I get SIGPIPE errors. Has anyone else run into similar problems?


Solution

  • You did not tell us what database you were using. On a SQLServer connected with RODBC, if have successfully done this by using transactions.

    channel = odbcConnect(database)
    odbcSetAutoCommit(channel,FALSE)
    # Make requests here
    odbcEndTran(channel,TRUE) 
    

    I am not sure if this will also help when you have only read request as in your case, but it might be worth giving a try because it is only twp additional lines. I asked about the server, because transactions work nicely with MSSqlServer under ODBC, and I had some mixed results with others (but did not try hard).