Search code examples
apache-sparkcross-join

How to implement "Cross Join" in Spark?


We plan to move Apache Pig code to the new Spark platform.

Pig has a "Bag/Tuple/Field" concept and behaves similarly to a relational database. Pig provides support for CROSS/INNER/OUTER joins.

For CROSS JOIN, we can use alias = CROSS alias, alias [, alias …] [PARTITION BY partitioner] [PARALLEL n];

But as we move to the Spark platform I couldn't find any counterpart in the Spark API. Do you have any idea?


Solution

  • It is oneRDD.cartesian(anotherRDD).