Search code examples
apache-hudi

Apache Hudi Upsert/Insert/Deletes at the same time


Can we run write operation type Upsert and Delete at the same time and same table?

Is Apache Hudi meta get corrupted??

Please help here to do the same using other if any solutions.

Thanks in Advance !!


Solution

  • With Hudi, you can upsert and delete records in the same query, without corrupting the Hudi metadata, to achieve this you have two options:

    • develop your own hoodie.datasource.write.payload.class and implement the logic in the class, where you can delete the records based on some condition (for ex when you provide a null value or based on a column value)
    • add the column _hoodie_is_deleted to your dataset source, and provide true for the records you want to delete, and keep it null for the records you want to upsert (mode Append and operation upsert)

    Update:

    If you want to run them in two separate queries, they are considered as 2 concurrent writes, you can activate OCC (optimistic concurrency control) which allow concurrent writes when there is no overlap (DELETE from partition X, and INSERT in partition Y), but when both queries are writing to the same partitions, they will both fail.