I'm reading cascading documentation chapter 5.2 Functions and I wonder what will happen with the following code. Should it work OK in multithreaded environment? The more general question is is the Function could be multithreaded? as I know the single mapper is single threaded.
In specific I've tested such code and it seems to me that this is not thread safe. Maybe I do not understand properly the documentation on page (39).
public class NotThreadSafeObject{
...
public void doSomething(){
// update state
}
public String getValue(){
// returns value from state
}
public class SomeFunction extends BaseOperation<Tuple> implements Function<Tuple>
{
// constructors
@Override
public void prepare( FlowProcess flowProcess, OperationCall<Tuple> call )
{
// create a reusable Object with state of size 1
call.setContext( new NotThreadSafeObject() );
}
public void operate( FlowProcess flowProcess, FunctionCall<Tuple> call )
{
// ...
NotThreadSafeObject obj = call.getContext();
obj.doSomething();
Tuple tup = new Tuple();
tup.set(0,obj.getValue());
call.getOutputCollector().add(tup);
}
@Override
public void cleanup( FlowProcess flowProcess, OperationCall<Tuple> call )
{
call.setContext( null );
}
}
Based on the Cascading documentation, this should work fine, and is in fact the primary reason to use the Context in a non-aggregating operation.