I took database from this lesson and tried to perform some manipulations with it. Here is my code:
install.packages(c("dbplyr", "RSQLite"))
dir.create("data_raw", showWarnings = FALSE)
download.file(url = "https://ndownloader.figshare.com/files/2292171",
destfile = "data_raw/portal_mammals.sqlite", mode = "wb")
library(dplyr)
library(dbplyr)
mammals <- DBI::dbConnect(RSQLite::SQLite(), "data_raw/portal_mammals.sqlite")
test <- tbl(mammals, "surveys")%>%
select(record_id, month)%>%
mutate(ratio=record_id/month)
test
I can`t understand why "ratio" variable is rounded down. Can someone explain this?
My guess would be that the columns you are taking the ratio of, record_id
and month
, are type integer. This means the answer is being returned in type integer, so it rounds down/discards the decimal places.
No, this is not the default R behaviour. But it is the default database behaviour. Why does this matter? When using dbplyr, the calculations are not done in R but are translated from R to SQL and executed in the database.
The fix is straightforward: start the calculation with the decimal 1.0
:
test <- tbl(mammals, "surveys")%>%
select(record_id, month)%>%
mutate(ratio = 1.0 * record_id / month)
This works because 1
and 1.0
are different to the database. 1
is a integer type and 1.0
is decimal type. Because the calculation of ratio now starts with a decimal it should now return a decimal.