Apply lambda function in a nested field Spark

I have the following schema:

--items : array
   |-- element : struct
       |-- id : long
       |-- value : double        
       |-- stock : struct
           |-- centerList : array
               |-- element : struct
                   |-- center : struct

I can iterate over items using

df = df.withColumn(‘items’, F.transform(‘items’, lambda x : x.dropFields(‘value’)

But how can I iterate over centerList? I’ve tried:

df = df.withColumn(‘items’, F.transform(‘items’, lambda x : x.withField(‘stock.centerList’, F.transform(‘centerList’, lambda y: y.drop(‘center’)))

But when I run my code I got the following error:

[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name ‘centerList’ cannot be resolved.

Solution

Your logic is correct but you need to properly reference the centerList column of the current stock in the inner transformation operation

def transform(col):
    return F.transform(
        col,
        lambda x: x.withField(
            'stock.centerList',
            F.transform(
                x['stock']['centerList'], 
                lambda y: y.dropFields('center')
            )
        )
    )

df = df.withColumn('items', transform('items'))