Search code examples
rdbplyr

Dbplyr error : Arithmetic overflow error converting expression to data type int


I'm pretty sure a similar query has been asked already. However my question is specific to when I connect an R session to my SQL Server database.

accept<-x%>% 

  group_by(SiteID, MachineID, LocationID) %>%

  filter(DateTime>="2019-01-1" & DateTime<"2019-12-31") %>%

  summarise(n=(sum(TenSecCount))) %>%

  collect()

When I try to collect the data into a data frame, I get the following error --

Arithmetic overflow error converting expression to data type int.
'SELECT "SiteID", "MachineID", "LocationID", CAST("n" AS VARCHAR(MAX)) AS "n"
FROM (SELECT TOP 100 PERCENT "SiteID", "MachineID", "LocationID", (SUM("TenSecCount")) AS "n"
FROM (SELECT TOP 100 PERCENT *
FROM (SELECT TOP 100 PERCENT "MachineID", "OutletID", "TenSecCount"

Any workarounds for this?


Solution

  • This error arises when you collect the data into R, because it is only at collection that your remote SQL table is evaluated.

    dbplyr defines your remote table by the SQL query that would return your results. Until you ask for results to be returned, your remote table definition is not much different from an SQL script waiting to be run.

    When you request results from that table, in your case using collect, the sql code is executed on the server and the results returned to R. This means you can have an invalid remote table definition, and not know it until you execute it. E.g.:

    remote_table <- server_df %>% 
      group_by(SiteID, MachineID, LocationID) %>%
      filter(DateTime>="2019-01-1" & DateTime<"2019-12-31") %>%
      summarise(n=(sum(TenSecCount)))
    # no error because all we have done is define an sql query and store it in remote_table
    
    # review underlying sql query
    show_query(remote_table)
    # if you copy & paste this query and try to run it directly on the server it will error
    
    # attempt to collect data
    local_table <- remote_table %>% collect()
    # error occurs on evaluation
    

    You can tell the error arose when the sql is evaluated because R returned to you the sql code that caused the error and the sql error message. "Arithmetic overflow error converting expression to data type int." is an SQL error, not an R error. See this question for ways to solve it.

    Hint, probably something like:

    remote_table <- server_df %>% 
      mutate(TenSecCount = CAST(TenSecCount AS BIGINT)) %>% # additional step changing data type
      group_by(SiteID, MachineID, LocationID) %>%
      filter(DateTime>="2019-01-1" & DateTime<"2019-12-31") %>%
      summarise(n=(sum(TenSecCount)))