Search code examples
amazon-web-servicesamazon-emrapache-hudi

Error consuming records caused by SdkInterruptedException when inserting into Hudi Table


I have this Hudi table that I created from a migration, so this has billions of rows. There were no problems when migrating, but as soon as I started a streaming to start writing fresh data to this table, these errors occurred:

enter image description here

ERROR - error producing records (org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:94)):94
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file s3://lake/tables/hudi/usage_fact_cpaas_by_month/organization_id=AAABBBCCC/year=2020/month=12/5235f14e-85b4-488e-99f4-9eb416532795-1_3-134-785_20201216202753.parquet

...

[2020-12-29 16:45:18,284] ERROR - error reading records from queue (org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:201)):201
java.lang.InterruptedException

I did the same thing for another migrated table and there were no problems. The only difference between both of the tables is the partition.

The execution takes place on AWS and uses Hudi 0.5.3.

Have any of you faced this problem? Not sure if this is a either a Hudi or an AWS problem.


Solution

  • I looked at the executor logs and found that there was a schema error.