Read only parquet column names in R

I am looking to get only the column names from a parquet file (with partitioning) using the arrow package in R. My hope is to have a vector of only the column names. I am able to do this using collect, however working with larger multi partition and multi file parquets takes longer than expected. Here is an example of what I have and hoping to achieve.

Create parquet with partion (some may have multiple partitions)

arrow::write_dataset(mtcars, "C:/Data/parquet/mtcars", format = "parquet", partitioning = c("cyl"))

Current way to get parquet column names

colnames(arrow::open_dataset(sources = "C:/Data/parquet/mtcars") %>%
  dplyr::collect())

Result of using colnames with collect

[1] "mpg"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb" "cyl"

I feel there is a more efficient way to get parquet column names without doing a collect. End goal to have a vector like above. Open to options and ideas.

Solution

According to the documentation, the Dateset object has got a schema method from which you can get the columns names.

I think it should be something like that:

arrow::open_dataset(sources = "C:/Data/parquet/mtcars")$schema$names

This will only load the metadata of the dataset and should be much faster thant loading all the data.