I have below code in Python but I need to convert this to pyspark,
qm1['c1'] = [x[0] in x[1] for x in zip(qm1['id'], qm1['question'])]
qm1['c1'] = qm1['c1'].astype(str)
qm1a = qm1[(qm1.c1 == 'True')]
The output of this python code is
question | key | id | c1 |
Women | 0 | omen | True |
machine | 0 | mac | True |
Could someone please help me out on the same as I am a beginner in Python?
here is my test test (as your question does not contain any)
|question|key| id|
| Women| 0|omen|
| machine| 2| mac|
| foo| 1| bar|
and my code to create the expected output :
from pyspark.sql import functions as F
df = df.withColumn("c1", F.col("question").contains(F.col("id")))
|question|key| id| c1|
| Women| 0|omen| true|
| machine| 2| mac| true|
| foo| 1| bar|false|
then you can simply filter
on c1:
|question|key| id| c1|
| Women| 0|omen|true|
| machine| 2| mac|true|