Search code examples
apache-sparkapache-iceberg

Is there a way to remove files belongs to a partition without physically delete them in iceberg?


there is add_files() to add some files from hive table to iceberg. but cannot find a way to reverse that operation other than drop the table and recreate.

CALL spark_catalog.system.add_files(
table => 'db.tbl',
source_table => 'db.src_tbl',
partition_filter => map('date', '2023-03-16', 'hour', '12')

every thing works expected till this step, but now if i want to add all files belongs to 2023-03-16 now it will complain some files is duplicate.

java.lang.IllegalStateException: 
Cannot complete import because data files to be imported already exist within the target table: 
.../part-00000-d9d0137c-d7d6-46f5-b78a-9f68b977c7af.c000.zstd.parquet.  
This is disabled by default as Iceberg is not designed for multiple references to the same file within the same table.  
If you are sure, you may set 'check_duplicate_files' to false to force the import.

obviously don't want to add duplicate either. is there a solution?


Solution

  • summary from the community slack thread.