Does someone know how I would write a sequence processing in a java stream API manner in Python ? The idea is to write the operations in the order they will happen:
myList.stream()
.filter(condition)
.map(action1)
.map(action2)
.collect(Collectors.toList());
Now in python I could do
[action2(action1(item)) for item in my_list if condition(item)]
But that is the opposite order.
How could I have something in the correct order ? Obviously I could use variables but then I would have to find a name for each partial result.
There's a library that already does exactly what you are looking for, i.e. lazy evaluation and the order of operations is the same with how it's written, as well as many more other good stuff like multiprocess or multithreading Map/Reduce.
It's named pyxtension
, it's prod ready, covered by unit-tests and maintained on PyPi, and it's released under MIT license - so you can use it for free in any commercial project.
Your code would be rewritten in this form:
from pyxtension.streams import stream
stream(myList)
.filter(condition)
.map(action1)
.map(action2)
.toList()
and
stream(myList)
.filter(condition)
.mpmap(action1) # this is for multi-processing map
.fastmap(action2) # this is multi-threaded map
.toList()
Note that the last statement toList()
does exact what you expect - it collects data as it would happen in a Spark RDD.