arrow::to_duckdb coerces int64 columns to doubles

arrow::to_duckdb() converts int64 columns to a double in the duckdb table. This happens if the .data being converted is an R data frame or a parquet file. How can I maintain the int64 data type?

Example

library(arrow, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)
library(vroom, warn.conflicts = FALSE)

# tibble with an int64 column
dd <- vroom(I("id\n9007199254740993\n"), col_type = "I", delim = ",")
dd
#> # A tibble: 1 × 1
#>        id
#>   <int64>
#> 1    9e15

# it's coereced to a double
to_duckdb(dd)
#> # Source:   table<arrow_001> [1 x 1]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.5.0:R 4.3.1/:memory:]
#>        id
#>     <dbl>
#> 1 9.01e15

Solution

If you look at ?to_duckdb, its con parameter defaults to arrow_duck_connection(), which if you look at it creates a DuckDB DBI connection with

on <- DBI::dbConnect(duckdb::duckdb())

If you look at ?duckdb::duckdb(), it has a bigint parameter which defaults to "numeric" documented as

How 64-bit integers should be returned, default is double/numeric. Set to integer64 for bit64 encoding.

So we can set the con parameter of to_duckdb() to our own DBI connection with that parameter set to "integer64":

da <- arrow::arrow_table(id = bit64::as.integer64("9007199254740993"))
da
#> Table
#> 1 rows x 1 columns
#> $id <int64>

# default for comparison
con1 <- DBI::dbConnect(duckdb::duckdb())
# how we want it
con2 <- DBI::dbConnect(duckdb::duckdb(bigint = "integer64"))

# using default connection
arrow::to_duckdb(da)
#> # Source:   table<arrow_001> [1 x 1]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.6.0:R 4.3.1/:memory:]
#>        id
#>     <dbl>
#> 1 9.01e15

# comparison
arrow::to_duckdb(da, con = con1)
#> # Source:   table<arrow_002> [1 x 1]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.6.0:R 4.3.1/:memory:]
#>        id
#>     <dbl>
#> 1 9.01e15

# how we want it
arrow::to_duckdb(da, con = con2)
#> # Source:   table<arrow_003> [1 x 1]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.6.0:R 4.3.1/:memory:]
#>        id
#>   <int64>
#> 1    9e15