How to write an arrow dataset based on a data.table grouping?

I have a dataset called df where I have year, month and day variables. I would like to use the write_dataset function to output a folder with the standard arrow dataset syntax as in the following image:

Within each folder there will be month=1, month=2, and so on.

Now, in order to create this I have used the following code:

df <- df %>% group_by(year, month, day)
output_folder = "my/path"
arrow::write_dataset(df, 
                     output_folder, 
                     format = "parquet", 
                     )

However, my dataset size is too big, and I would like to use data.table to take advantage of fast grouping. My approach to do the same has been the following:

grouping_cols = c("year", "month", "day")
setkeyv(df, grouping_cols)

arrow::write_dataset(df, 
                     output_folder, 
                     format = "parquet", 
                     )

However, now the result is not grouped and a single .parquet file is returned (not fully utilizing the potential of arrow::write_dataset).

Is there any way to have the same dataset grouped by specified columns but based on data.table instead of dplyr groupings?

Solution

If you look at the docs the default partitioning parameter is whatever the dataset's dplyr::group_vars are. That concept isn't automatically translated into the data.table analog so you have to supply that parameter if you're not using a dplyr object as the input.

arrow::write_dataset(df, 
                    output_folder,
                    partitioning=grouping_cols,
                    format = "parquet", 
                    )