Performance difference between Table- and DataStream-API

Let us assume that I have two operations which I can write easily using both APIs in PyFlink (e.g. a sum of a column over a TumblingWindow). Are there any performance differences when I use the predefined Table-API commands vs manually implementing the count in Python as a ProcessWindowFunction?

To be more precise, I want to compare

table_from_stream \
    .window(Tumble.over(lit(15).minutes)).on(col('time')).alias('w'))
    .groupby(col('w'), col('a'))
    .select(col('w').end, col('a'), col('b').sum)

datastream \
    .key_by('a') \
    .window(TumblingEventTimeWindows.of(Time.minutes(15))) \ 
    .process(MyProcessFunctionThatManuallySums)

Solution

The version using the Table API will be more efficient because

The job will run entirely in Java, with no need to cross the java/python barrier. Python will only be used to setup the pipeline.
A single piece of managed state (per key) will be used to accumulate a running sum, rather than collecting a list of values to add up at the end of each window. OTOH, the datastream job will have to send that possibly very large list to Python.
The Table API has a very efficient serializer.
Filter pushdown will be used to filter out any columns other than A or B, rather than those columns being serialized and passed through the pipeline.