Search code examples
rapache-arrowfeather

How to read column names and metadata from feather files in R arrow?


The (now-superseded) stand-alone feather library for R had a function called feather_metadata() that allowed to read column names and types from feather files on disk, without opening them. This was useful to select only specific columns when loading a feather file in R with read_feather(path, columns = c(...))

Now that the feather format is part of the arrow library, feather_metadata() is not included anymore.

Is there an equivalent function in arrow to read column names and types of files on disk from R before loading them?


Solution

  • In the current version of the arrow R package, there is no direct replacement for feather::feather_metadata(path), but there are two workarounds that might work for you:

    • If you just need the column names (not the data types), you can do this:

      rf <- arrow::ReadableFile$create(path)
      fr <- arrow::FeatherReader$create(rf)
      names(fr)
      
    • If you need the data types of the columns, you can try this:

      arrow::read_feather(path, as_data_frame = FALSE)
      

      That gives output like what you're looking for, and it should be pretty fast (because it does not convert the file to an R data frame) but it does read the full file (or at least it memory-maps the full file) so you might not want to do this if your Feather files are really large.