Search code examples
rapache-arrowduckdb

arrow::to_duckdb coerces int64 columns to doubles


arrow::to_duckdb() converts int64 columns to a double in the duckdb table. This happens if the .data being converted is an R data frame or a parquet file. How can I maintain the int64 data type?

Example

library(arrow, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)
library(vroom, warn.conflicts = FALSE)

# tibble with an int64 column
dd <- vroom(I("id\n9007199254740993\n"), col_type = "I", delim = ",")
dd
#> # A tibble: 1 × 1
#>        id
#>   <int64>
#> 1    9e15

# it's coereced to a double
to_duckdb(dd)
#> # Source:   table<arrow_001> [1 x 1]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.5.0:R 4.3.1/:memory:]
#>        id
#>     <dbl>
#> 1 9.01e15

Solution

  • If you look at ?to_duckdb, its con parameter defaults to arrow_duck_connection(), which if you look at it creates a DuckDB DBI connection with

    on <- DBI::dbConnect(duckdb::duckdb())
    

    If you look at ?duckdb::duckdb(), it has a bigint parameter which defaults to "numeric" documented as

    How 64-bit integers should be returned, default is double/numeric. Set to integer64 for bit64 encoding.

    So we can set the con parameter of to_duckdb() to our own DBI connection with that parameter set to "integer64":

    da <- arrow::arrow_table(id = bit64::as.integer64("9007199254740993"))
    da
    #> Table
    #> 1 rows x 1 columns
    #> $id <int64>
    
    # default for comparison
    con1 <- DBI::dbConnect(duckdb::duckdb())
    # how we want it
    con2 <- DBI::dbConnect(duckdb::duckdb(bigint = "integer64"))
    
    # using default connection
    arrow::to_duckdb(da)
    #> # Source:   table<arrow_001> [1 x 1]
    #> # Database: DuckDB 0.8.1 [root@Darwin 22.6.0:R 4.3.1/:memory:]
    #>        id
    #>     <dbl>
    #> 1 9.01e15
    
    # comparison
    arrow::to_duckdb(da, con = con1)
    #> # Source:   table<arrow_002> [1 x 1]
    #> # Database: DuckDB 0.8.1 [root@Darwin 22.6.0:R 4.3.1/:memory:]
    #>        id
    #>     <dbl>
    #> 1 9.01e15
    
    # how we want it
    arrow::to_duckdb(da, con = con2)
    #> # Source:   table<arrow_003> [1 x 1]
    #> # Database: DuckDB 0.8.1 [root@Darwin 22.6.0:R 4.3.1/:memory:]
    #>        id
    #>   <int64>
    #> 1    9e15