Search code examples
javacascading

Crossjoin in Cascading


I'd like to crossjoin two streams of tuples in Cascading. Let's suppose there are two lists: ladies and gentlemen, and the goal is to write all the possible lady-gentleman combinations out to a file (e.g. all the possible matches from the "women seeking men" section of a hypothetical dating website).

I found a similar example on this blog and attempted to tweak the code to make a crossjoin (see https://github.com/alexwoolford/cascading-crossjoin-stackoverflow-question).

The operate method in the Crossjoin class throws a null-pointer. Firstly, the getJoinerClosure() call in this line returns null:

JoinerClosure joinerClosure = bufferCall.getJoinerClosure();

... and then the if statement that immediately follows tries to get the size of null:

if( joinerClosure.size() != 2 )
    [...]

... resulting in a null-pointer exception.

Can you see where I'm going wrong?


Solution

  • It worked when I removed the rhsGroupFields argument from the new CoGroup constructor, i.e. changed from:

    Pipe pipeLadiesAndGentlemen = new CoGroup(pipeLadies, Fields.NONE, pipeGentlemen, Fields.NONE, new Fields("lady", "gentleman"), new BufferJoin());
    

    .. to:

    Pipe pipeLadiesAndGentlemen = new CoGroup(pipeLadies, Fields.NONE, pipeGentlemen, Fields.NONE, new BufferJoin());