Search code examples
amazon-web-servicesscalaapache-sparkamazon-s3aws-glue

Can we set remove column names from s3 partition path and set path to values?


I am just curious, for Spark using Glue sinkFormat, is it possible to save the file as "2021/05/05/filename.parquet" and not as "year=2021/month=05/day=05/filename.parquet". I tried to play with 'writepath' but it works at record level and I believe it will break Spark's ability to save partitioned files.


Solution

  • This is not possible.

    Partitioning drops the columns used for partitioning. Spark uses directory structure for partition discovery and the correct structure so including column names is necessary for it to work.