Search code examples
dockerperformancesqlitememoryrsqlite

Using R to Query a SQLite file Help: DBI::dbGetQuery() never completes, very slow


I have a SQLite file that contains a table with 9 million or so rows and 30+ columns. Up until a few days ago, the following code worked fine:

path <- file.path(basepath, "Rmark_web", "SQL_Data", "2020Q3_Enterprise_Exposure_Wind.sqlite")
cn <- DBI::dbConnect(RSQLite::SQLite(), path)
df <- DBI::dbGetQuery(cn, "select Longitude, Latitude, City, State, Total_Value, GridID, new_LOB from [2020Q3_Enterprise_Exposure_Wind] where State in ('GA')")
DBI::dbDisconnect(cn)

When I run the code that contains this query on my local machine, it takes some time but it does finish. I am currently trying to run it in a docker image with the following metrics:

docker run --memory=10g --rm -d -p 8787:8787     -v /efs:/home/rstudio/efs     -e ROOT=TRUE     -e DISABLE_AUTH=true     myrstudio        

Is there a way to debug the RSQLite package? Is there another way to perform this query without using this package? The rest of the code runs fine, but it gets held up on this specific step and usually does not finish (especially if it is the 2nd or 3rd time that this piece of code runs in the docker image).

The number of states to include in the query changes from run to run.


Solution

  • If you have this issue, be sure to remove any columns you are not using from the SQL file. In the end, I loaded it as a postgres database online and that seems to fix the issue I was experiencing. Here is the new query for anyone's benefit.

    library(RPostgres)
    library(RPostgreSQL)
    library(DBI)
    db <- 'airflow_db'  #provide the name of your db
        
    host_db <- [omitted for privacy] 
        
    db_port <- [omitted for privacy]  # or any other port specified by the DBA
        
    db_user <- [omitted for privacy]  
        
    db_password <- Sys.getenv("DB_PASS")
        
    con <- dbConnect(RPostgres::Postgres(), dbname = db, host=host_db, port=db_port, user=db_user, password=db_password)
    query <- paste('select * from weather_report.ent_expo_data where "State" in (', in_states_clause,')', sep='')
    print(query)
    df <- dbGetQuery(con, query)