I was using Flink CEP module and wondering if I pass a function to where clause, which would be returning Boolean, whether it will work in distributed manner or not.
Example-:
val pattern= Pattern.start("begin").where(v=>booleanReturningFunction(v))
Will the above code work in distributed manner when submitted as flink job for CEP with simple condition.
Yuval already gave the correct answer in the comments but I'd like to expand on it:
Yes, any function that you provide can be run in a distributed fashion. First of all, as Yuval pointed out, all your code gets distributed on the compute cluster on job submission.
The missing piece is that also your job itself gets distributed. If you check the API, you see it in the interfaces:
public Pattern<T, F> where(IterativeCondition<F> condition) { ...
Pattern expects some condition. If you look at its definition, you can see the following
public abstract class IterativeCondition<T> implements Function, Serializable { ... }
So the thing that you pass to where
has to be Serializable
. Your client can serialize your whole job including all function definitions and send it to the JobManager, which distributes it to the different TaskManagers. Because every piece of the infrastructure also has your job jar, it can deserialize the job including your function. Deserialization also means it creates copies of the function, which is necessary for distributed execution.