Search code examples
arraysdataframeapache-sparkpysparkposition

Find the position of a value in an array and use the position to get a value from array in other column


I have a table, where I would like to loop through the array in column "Type". If it contains 'cover', take the value from the "ID" array of the same position. And if there are 2 'cover' values, check the "style" column - if it is 'multi', ignore and take other value position.

ID Type style
[222, 345, 678] ['leg','cover','cover'] ['modern','multi','traditional']
[989, 787, 125] ['cover','hanger','comp'] ['modern','modern','modern']

The desired output:

ID Type style output
[222, 345, 678] ['leg','cover','cover'] ['modern','multi','traditional'] 678
[989, 787, 125] ['cover','hanger','comp'] ['modern','modern','modern'] 989

I am using this code to get the position but I am stuck with the rest.

df.select(df.Type, array_position(df.Type, "cover").alias('a_pos')).show()

Solution

  • You could create a zipped array and filter out elements which you don't need.

    Input:

    from pyspark.sql import functions as F
    df = spark.createDataFrame(
        [([222, 345, 678], ['leg','cover','cover'], ['modern','multi','traditional']),
         ([989, 787, 125], ['cover','hanger','comp'], ['modern','modern','modern']),
         ([123, 234, 345], ['cover','hanger','comp'], ['multi','modern','modern']),
         ([456, 567, 678], ['couch','hanger','comp'], ['modern','modern','modern'])],
        ['ID', 'Type', 'style'])
    

    Script:

    filtered = F.filter(
        F.arrays_zip('ID', 'Type', 'style'),
        lambda x: (x.Type == 'cover') & (x.style != 'multi')
    )
    df = df.withColumn('output', filtered[0].ID)
    
    df.show(truncate=0)
    # +---------------+---------------------+----------------------------+------+
    # |ID             |Type                 |style                       |output|
    # +---------------+---------------------+----------------------------+------+
    # |[222, 345, 678]|[leg, cover, cover]  |[modern, multi, traditional]|678   |
    # |[989, 787, 125]|[cover, hanger, comp]|[modern, modern, modern]    |989   |
    # |[123, 234, 345]|[cover, hanger, comp]|[multi, modern, modern]     |null  |
    # |[456, 567, 678]|[couch, hanger, comp]|[modern, modern, modern]    |null  |
    # +---------------+---------------------+----------------------------+------+