Splitting and RDD row to different column in Pyspark

This is a continuation of my previous question.

I am trying to find the index of 'e' of the following RDD using pyspark:

['a,b,c,d,e,f']

I am using the method:

rdd.zipWithIndex().lookup('e')

But I get = []

as the Rdd is in the form: [ ['a,b,c,d,e,f']

I tried

rdd.flatMap(lambda x: x)

so that I use lookup to get the index, but I am still getting []

Please help me. How do I get the Rdd as:

['a','b','c','d','e','f']

So that I can do this method

    rdd.zipWithIndex().lookup('e')

Solution

The issue is that you are using whole string as an array

['a,b,c,d,e,f']

So, here a,b,c,d,e,f is all treated as one string. You need to separate them into separate rows of the RDD you have. You can simply use flatMap to separate the string into separate RDD rows and then use zipWithIndex() and lookUp()

print(rdd.flatMap(lambda x: x.split(",")).zipWithIndex().lookup("e"))

Python method chaining in functional programming style
flask-jwt-extended: Fake Authorization Header during testing (pytest)
For loop through the list unless empty?
Polars make all groups the same size
Is there a way to specify a default base-template for all templates in django?
How to tackle time limit exceeded error in leetcode
Is pd.get_dummies() updated in newer versions of Pandas making it default to Booleans (True/False) instead of (0/1)?
What's the function like sum() but for multiplication? product()?
How to type hint a dynamically-created dataclass
Issue with pulling the data with EIA API with Python
403 Forbidden Error when scraping a site, user-agents already used and updated. Any ideas?
Fullstack web-hosting services
How to handle an AnalysisException on Spark SQL?
Python requests is slow and takes very long to complete HTTP or HTTPS request
Is there a way to modify an element in a Numpy array based on the value of other elements?
Tkinter grid manager height/width nonconsistent
Sql Alchemy Insert Statement failing to insert, but no error
How can I create a Polars struct while eval-ing a list?
Excel using win32com and python
django: on pypy, psyco, unladen swallow or cpython, which one is the fastest?
How convert a list into multiple columns and a dataframe?
Name not defined in type annotation
Static type checkers and Language Servers not recognizing attributes of objects that are subclasses
How do I get multiple OID values in PySNMP?
how to log a file in Django
Is there a simple and efficient way to evaluate Elementary Symmetric Polynomials in Python?
Iterating over two lists one after another
Python -i flag for production
What is PyCompilerFlags in Python C API?
How to make Python check whether a variable is a number or letter