I've realised I have a huge amount on data partitioned on too small files on a HDFS. The reason for this, is that I've saved the data using too much partitioning keys. Therefore, I need to merge the data under that partitioning key in the HDFS.
Fortunately, the partitioning key I want to delete is exactly the last one (I don't know if it makes it easier). I cannot come across a solution not using a script that would take too much time to do the job.
Here is an example of the HDFS I have:
/part1={lot_of_values}/part2={lot_of_values}/part_to_delete={lot_of_values}/{lot_of_files}.parquet
But I want to achieve:
/part1={lot_of_values}/part2={lot_of_values}/{lot_of_files}.parquet
Therefore I could have bigger files to load quickly.
Fortunately, the partitioning key I want to delete is exactly the last one (I don't know if it makes it easier). I cannot come across a solution not using a script that would take too much time to do the job.