Search code examples
rurlparquet

Loading multiple parquet files into R from URL (Dropbox folder)


I'm trying to load multiple parquet files from my Dropbox folder's URL (I did not set those files to local just to save my computer memory). I used the following code, but it returns nothing.

library(arrow)
library(dplyr)

files <- list.files(path = "https://www.dropbox.com/sh/g8ck3t859uahkdi/AADw-kp7EYfU-SMZc4mmtCM2a?dl=1", pattern = "*.parquet", full.names = T)

tbl <- sapply(files, read_parquet, simplify=FALSE) %>% 
bind_rows(.id = "id")

I've referenced this and this post, but couldn't figure out how to.

I used windows machine for this task (do I need to set mode to "wb"?) but may switch to Mac if need be.


Solution

  • If we use the second option of downloading to a destination folder, then

    library(arrow)
    library(purrr)
    url <- "https://www.dropbox.com/sh/g8ck3t859uahkdi/AADw-kp7EYfU-SMZc4mmtCM2a?dl=1"
    filezip <- "/path/to/yourfolder/filenew.zip"
    new_folder <-  "/path/to/yourfolder/filenew"
    download.file(url, filezip, mode = "wb")
    unzip(filezip, exdir = new_folder)
    files <- list.files(path = new_folder, 
                pattern = "\\.parquet$", full.names = TRUE)
    tbl <- map_dfr(files, read_parquet) 
    nrow(tbl)
    #[1] 168019