I would like to use dplyr
and RMySQL
to work with my large data. There is no issues with dplyr
code. The problem (I think) is about exporting data out of MySQL
to R
. My connection is dropped every time even I am using n=Inf
in collect
. Theoretically, my data should have more than 50K rows, but I can only get around 15K back. Any suggestions are appreciated.
library(dplyr)
library(RMySQL)
# Connect to a database and select a table
my_db <- src_mysql(dbname='aermod_1', host = "localhost", user = "root", password = "")
my_tbl <- tbl(my_db, "db_table")
out_summary_station_raw <- select(my_tbl, -c(X, Y, AVERAGE_CONC))
out_station_mean_local <- collect(out_summary_station_raw)
Pool
library(pool)
library(RMySQL)
library(dplyr)
pool <- dbPool(
drv = RMySQL::MySQL(),
dbname = "aermod_1",
host = "localhost",
username = "root",
password = ""
)
out_summary_station_raw <- src_pool(pool) %>% tbl("aermod_final") %>% select(-c(X, Y, AVERAGE_CONC))
out_station_mean_local <- collect(out_summary_station_raw, n = Inf)
Warning messages:
1: In dbFetch(res, n) : error while fetching rows
2: Only first 15,549 results retrieved. Use n = Inf to retrieve all.
Checked log and it looks fine from the server side. For my example, the slow-log
said Query_time: 79.348351 Lock_time: 0.000000 Rows_sent: 15552 Rows_examined: 16449696
, but collect
just could not retrieve the full data. I Am able to replicate the same move using MySQL Bench
.
After the most recent RMySQL update, I noticed that I could not collect()
data from large views, and I reported it as an issue. Your problem might be related.
One thing to try is to roll back to the last version.
devtools::install_version("RMySQL", version = "0.10.9",
repos = "http://cran.us.r-project.org")