install.packages("arrow")
library("arrow")
a <- arrow::open_dataset("a.parquet")
b <- arrow::open_dataset("b.parquet")
a1 <- as.data.frame(a)
b1 <- as.data.frame(b)
merge <- rbindlist(list(a1, b1))
I look forward to a quick way to combine both data in Arrow format, or even if you don't.
The arrow package supports partitioning reading multiple parquet files at once which may achieve what you are after (see note below about partitioning from @r2evans). That is assuming the datasets have an identical schema then you can open multiple files in a single call to open_dataset
which will then be treated as if they were a single file e.g.
library(arrow)
library(dplyr)
file1 <- tempfile()
file2 <- tempfile()
write_parquet(iris, file1)
write_parquet(iris, file2)
files <- c(file1, file2)
x <- open_dataset(files)
x |>
select(Sepal.Length) |>
nrow()
Full details can be found in the vignette