I am trying to understand how foreach
method works. In my jupyter notebook, I tried :
def f(x): print(x)
a = sc.parallelize([1, 2, 3, 4, 5])
b = a.foreach(f)
print(type(b))
<class 'NoneType'>
I can execute that without any problem, but I don't have any output except the print(type(b))
part. The foreach
doesn't return anything, just a none type. I do not know what foreach
is supposed to do, and how to use it. Can you explain me what it is used for ?
foreach
is an action, and does not return anything; so, you cannot use it as you do, i.e. assigning it to another variable like b = a.foreach(f)
. From Learning Spark, p. 41-42:
Adapting the simple example from the docs, run in a PySpark terminal:
>>> def f(x): print(x)
>>> a = sc.parallelize([1, 2, 3, 4, 5])
>>> a.foreach(f)
5
4
3
1
2
(NOTE: not sure about Jupyter, but the above code will not produce any print results in a Databricks notebook.)
You may also find the answers in this thread helpful.