Search code examples
rsqlitedplyrdbplyr

Mutate function rounds values down, why?


I took database from this lesson and tried to perform some manipulations with it. Here is my code:

install.packages(c("dbplyr", "RSQLite"))
dir.create("data_raw", showWarnings = FALSE)
download.file(url = "https://ndownloader.figshare.com/files/2292171",
              destfile = "data_raw/portal_mammals.sqlite", mode = "wb")
library(dplyr)
library(dbplyr)
mammals <- DBI::dbConnect(RSQLite::SQLite(), "data_raw/portal_mammals.sqlite")
test <- tbl(mammals, "surveys")%>%
        select(record_id, month)%>%
        mutate(ratio=record_id/month)
test

I can`t understand why "ratio" variable is rounded down. Can someone explain this?


Solution

  • My guess would be that the columns you are taking the ratio of, record_id and month, are type integer. This means the answer is being returned in type integer, so it rounds down/discards the decimal places.

    No, this is not the default R behaviour. But it is the default database behaviour. Why does this matter? When using dbplyr, the calculations are not done in R but are translated from R to SQL and executed in the database.

    The fix is straightforward: start the calculation with the decimal 1.0:

    test <- tbl(mammals, "surveys")%>%
            select(record_id, month)%>%
            mutate(ratio = 1.0 * record_id / month)
    

    This works because 1 and 1.0 are different to the database. 1 is a integer type and 1.0 is decimal type. Because the calculation of ratio now starts with a decimal it should now return a decimal.