Search code examples
pythonapache-sparkpysparkrdd

Convert an RDD to iterable: PySpark?


I have an RDD which I am creating by loading a text file and preprocessing it. I dont want to collect it and save it to the disk or memory(entire data) but rather want to pass it to some other function in python which consumes data one after the other is form of iterable.

How is this possible?

data =  sc.textFile('file.txt').map(lambda x: some_func(x))

an_iterable = data. ##  what should I do here to make it give me one element at a time?
def model1(an_iterable):
 for i in an_iterable:
  do_that(i)

model(an_iterable)

Solution

  • I believe what you want is toLocalIterator():