What's happening: I am getting following error while joining two datasets in flink:
Hash join exceeded maximum number of recursions, without reducing partitions enough to be memory resident. Probably cause: Too many duplicate keys.
I have two datasets, one large and one small, so I have used the join.Hint as Repartition
hash second but still i am facing the same issue.
can anyone explain me the root cause of this exception?
Data skew can occur when "jion" occurs in small and large datasets. There's going to be a lot of rezoning, and I have a feeling that your problem might be related to that.