I have a table, where I would like to loop through the array in column "Type". If it contains 'cover', take the value from the "ID" array of the same position. And if there are 2 'cover' values, check the "style" column - if it is 'multi', ignore and take other value position.
ID | Type | style |
---|---|---|
[222, 345, 678] | ['leg','cover','cover'] | ['modern','multi','traditional'] |
[989, 787, 125] | ['cover','hanger','comp'] | ['modern','modern','modern'] |
The desired output:
ID | Type | style | output |
---|---|---|---|
[222, 345, 678] | ['leg','cover','cover'] | ['modern','multi','traditional'] | 678 |
[989, 787, 125] | ['cover','hanger','comp'] | ['modern','modern','modern'] | 989 |
I am using this code to get the position but I am stuck with the rest.
df.select(df.Type, array_position(df.Type, "cover").alias('a_pos')).show()
You could create a zipped array and filter out elements which you don't need.
Input:
from pyspark.sql import functions as F
df = spark.createDataFrame(
[([222, 345, 678], ['leg','cover','cover'], ['modern','multi','traditional']),
([989, 787, 125], ['cover','hanger','comp'], ['modern','modern','modern']),
([123, 234, 345], ['cover','hanger','comp'], ['multi','modern','modern']),
([456, 567, 678], ['couch','hanger','comp'], ['modern','modern','modern'])],
['ID', 'Type', 'style'])
Script:
filtered = F.filter(
F.arrays_zip('ID', 'Type', 'style'),
lambda x: (x.Type == 'cover') & (x.style != 'multi')
)
df = df.withColumn('output', filtered[0].ID)
df.show(truncate=0)
# +---------------+---------------------+----------------------------+------+
# |ID |Type |style |output|
# +---------------+---------------------+----------------------------+------+
# |[222, 345, 678]|[leg, cover, cover] |[modern, multi, traditional]|678 |
# |[989, 787, 125]|[cover, hanger, comp]|[modern, modern, modern] |989 |
# |[123, 234, 345]|[cover, hanger, comp]|[multi, modern, modern] |null |
# |[456, 567, 678]|[couch, hanger, comp]|[modern, modern, modern] |null |
# +---------------+---------------------+----------------------------+------+