I have the following schema:
--items : array
|-- element : struct
|-- id : long
|-- value : double
|-- stock : struct
|-- centerList : array
|-- element : struct
|-- center : struct
I can iterate over items using
df = df.withColumn(‘items’, F.transform(‘items’, lambda x : x.dropFields(‘value’)
But how can I iterate over centerList? I’ve tried:
df = df.withColumn(‘items’, F.transform(‘items’, lambda x : x.withField(‘stock.centerList’, F.transform(‘centerList’, lambda y: y.drop(‘center’)))
But when I run my code I got the following error:
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name ‘centerList’ cannot be resolved.
Your logic is correct but you need to properly reference the centerList
column of the current stock in the inner transformation operation
def transform(col):
return F.transform(
col,
lambda x: x.withField(
'stock.centerList',
F.transform(
x['stock']['centerList'],
lambda y: y.dropFields('center')
)
)
)
df = df.withColumn('items', transform('items'))