Search code examples
apache-sparkpysparkpython-3.5flatmap

pyspark flatmat error: TypeError: 'int' object is not iterable


This is the sample example code in my book:

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("spark://chetan-ThinkPad- 
E470:7077").setAppName("FlatMap")
sc = SparkContext(conf=conf)

numbersRDD = sc.parallelize([1, 2, 3, 4])
actionRDD = numbersRDD.flatMap(lambda x: x + x).collect()
for values in actionRDD:
    print(values)

I am getting this error: TypeError: 'int' object is not iterable

    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more

Solution

  • You cannot use flatMap on an Int object

    flatMap can be used in collection objects such as Arrays or list.

    You can use map function on the rdd type that you have RDD[Integer]

    numbersRDD = sc.parallelize([1, 2, 3, 4])
    actionRDD = numbersRDD.map(lambda x: x + x)
    
    def printing(x):
        print x
    
    actionRDD.foreach(printing)
    

    which should print

    2
    4
    6
    8