Search code examples
rparquetapache-arrow

Error: Invalid: Unrecognized filesystem type in URI when loading parquet file from url using arrow package


I'm pretty new to parquet file format and I'm using the read_parquet() (in the arrow package) to load parquet file (stored in my Dropbox share folder) into R. However, I received the following error message

library(arrow)
 df <- read_parquet("https://www.dropbox.com/s/mysgf4sojpjgyp7/part-394.parquet?dl=1")

Error: Invalid: Unrecognized filesystem type in URI: https://www.dropbox.com/s/mysgf4sojpjgyp7/part-394.parquet?dl=1

What might cause this problem here and do I need to partition the url link beforehand?


Solution

  • The file reading functions in the arrow package do not yet support HTTP[S] URIs. We hope to add this feature in a future release (ARROW-7594). In the meantime:

    If you have Dropbox installed on the computer where you're running this, use the local path to the file instead of the HTTPS URI.

    If you do not have Dropbox installed, then download the file first, like this:

    myfile <- tempfile()
    download.file(
      "https://www.dropbox.com/s/mysgf4sojpjgyp7/part-394.parquet?dl=1",
      myfile,
      mode = "wb"
    )
    df <- read_parquet(myfile)