Flink Hash Join exceeded maximum number of recursion Error

What's happening: I am getting following error while joining two datasets in flink:

Hash join exceeded maximum number of recursions, without reducing partitions enough to be memory resident. Probably cause: Too many duplicate keys.

I have two datasets, one large and one small, so I have used the join.Hint as Repartition hash second but still i am facing the same issue.

can anyone explain me the root cause of this exception?

Solution

Data skew can occur when "jion" occurs in small and large datasets. There's going to be a lot of rezoning, and I have a feeling that your problem might be related to that.

What are the benefits of Apache Beam over Spark/Flink for batch processing?
What is/are the main difference(s) between Flink and Storm?
FLINK - will SQL window flush the element on regular interval for processing
Difference between job, task and subtask in flink
Flink failed to deserialize JSON produced by Debezium
Flink serialization of java.util.List and java.util.Map
Flink webUI - GC time
Where the Upsert Kafka connector consumer start?
The implementation of the AbstractRichFunction is not serializable when using JDBC Sink in Flink
Flink standalone mode takes too long to start
Limiting the state size in flink
Immediate CEP Event Trigger Issue with WatermarkStrategy in Flink 1.16.1
Connect a stream with watermarks with another one without watermarks in Flink
Read a keyed Kafka Record using apache Flink?
Error in Flink process Kafka topic:java.net.ConnectException: Connection refused (Connection refused)
Apache Flink with multiple Kafka sources. Ensure one topic is fully read before consuming data on the other topic
Flink user defined sink connector can not serialize data into JSON format
Using Spring with Apache Flink - Command line arguments are not available to Spring
Is there any chance to limit database sessions using jdbc sinks with apache flink?
Flink GlobalWindow Trigger only process the trigger event
Why does Flink Table with Kafka Connector not return results for window-based aggregation operations?
Dependency management and execution environment in apache flink
The POJO class passes the test ,but shows invalid during execution
Flink KeyedProcessFunction Creation Count
Apache Flink Python Datastream API sink to Parquet
Unable to use s3-fs-hadoop plugin in Kubernetes
Build a JSON_Object value in Flink SQL
Kafka Migration with MM2 and Flink: How to Handle Offset Changes and Savepoints?
Performance difference between Table- and DataStream-API
Apache Flink: restoring state from checkpoint with changes Kafka topic