I have the following schema.
|--items : array
|-- element : struct
|-- id : long
|-- value : double
|-- stock : array
|-- element : string
I’m trying to drop stock column from my schema, my desired output is:
|--items : array
|-- element : struct
|-- id : long
|-- value : double
I’ve tried to drop the column using the following codes:
df = df.withColumn(‘items’, F.col(‘items’).dropFields(‘stock’)
This gives me the following error:
Parameter 1 requires “STRUCT” type, however “items” has type “Array<Struct…”
I also tried
df = df.withColumn(“items”, F.col(“items”).cast(cast)
Note: My cast
here is a schema without the stock, but I got the following error:
Cannot resolve “items” due to data type mismatch: cannot cast “source schema…” to “desired schema…”
So, my doubt is, how can I drop the stock column to get my desired output?
dropFields
requires a column of struct type but in your case you have column which contains array of structs. The solution is to apply a transform
function on each struct inside array and drop the corresponding field
df = df.withColumn('items', F.transform('items', lambda x: x.dropFields('stock')))
root
|-- items: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: long (nullable = true)
| | |-- value: long (nullable = true)