Search code examples
apache-flink

Flink Hash Join exceeded maximum number of recursion Error


What's happening: I am getting following error while joining two datasets in flink:

Hash join exceeded maximum number of recursions, without reducing partitions enough to be memory resident. Probably cause: Too many duplicate keys.

I have two datasets, one large and one small, so I have used the join.Hint as Repartition hash second but still i am facing the same issue.

can anyone explain me the root cause of this exception?


Solution

  • Data skew can occur when "jion" occurs in small and large datasets. There's going to be a lot of rezoning, and I have a feeling that your problem might be related to that.