I'd like to crossjoin two streams of tuples in Cascading. Let's suppose there are two lists: ladies and gentlemen, and the goal is to write all the possible lady-gentleman combinations out to a file (e.g. all the possible matches from the "women seeking men" section of a hypothetical dating website).
I found a similar example on this blog and attempted to tweak the code to make a crossjoin (see https://github.com/alexwoolford/cascading-crossjoin-stackoverflow-question).
The operate
method in the Crossjoin
class throws a null-pointer. Firstly, the getJoinerClosure()
call in this line returns null
:
JoinerClosure joinerClosure = bufferCall.getJoinerClosure();
... and then the if
statement that immediately follows tries to get the size of null
:
if( joinerClosure.size() != 2 )
[...]
... resulting in a null-pointer exception.
Can you see where I'm going wrong?
It worked when I removed the rhsGroupFields
argument from the new CoGroup
constructor, i.e. changed from:
Pipe pipeLadiesAndGentlemen = new CoGroup(pipeLadies, Fields.NONE, pipeGentlemen, Fields.NONE, new Fields("lady", "gentleman"), new BufferJoin());
.. to:
Pipe pipeLadiesAndGentlemen = new CoGroup(pipeLadies, Fields.NONE, pipeGentlemen, Fields.NONE, new BufferJoin());