Search code examples
pythonlistdataframepysparkzip

Need to convert list to dataframe in pyspark


I have below code in Python but I need to convert this to pyspark,

qm1['c1'] = [x[0] in x[1] for x in zip(qm1['id'], qm1['question'])]
qm1['c1'] = qm1['c1'].astype(str)
qm1a = qm1[(qm1.c1 == 'True')]

The output of this python code is

question key id c1
Women 0 omen True
machine 0 mac True

Could someone please help me out on the same as I am a beginner in Python?


Solution

  • here is my test test (as your question does not contain any)

    df.show()
    +--------+---+----+
    |question|key|  id|
    +--------+---+----+
    |   Women|  0|omen|
    | machine|  2| mac|
    |     foo|  1| bar|
    +--------+---+----+
    

    and my code to create the expected output :

    from pyspark.sql import functions as F
    
    df = df.withColumn("c1", F.col("question").contains(F.col("id")))
    df.show()
    +--------+---+----+-----+
    |question|key|  id|   c1|
    +--------+---+----+-----+
    |   Women|  0|omen| true|
    | machine|  2| mac| true|
    |     foo|  1| bar|false|
    +--------+---+----+-----+
    

    then you can simply filter on c1:

    df.where("c1").show()
    +--------+---+----+----+
    |question|key|  id|  c1|
    +--------+---+----+----+
    |   Women|  0|omen|true|
    | machine|  2| mac|true|
    +--------+---+----+----+