I have below sample row in spark dataframe.
{
"id":"B07H3MVTSN",
"mid":4,
"inner":{
"type1":[{
"cid":"B06XVVSLX8"
},
{
"cid":"B06XJ2JZ2Z"
}
],
"type2":[{
"cid":"B06XVVSLX1"
},
{
"cid":"B06XJ2JZ22"
}
],
"type3":[{
"cid":"B06XVVSLX3"
},
{
"cid":"B06XJ2JZ24"
}
]
}
}
I have a list of strings which can be {"type1","type2"}.
I want to filter out(get rid of) rows from the dataframe which do not have either of those two fields. How to add filter() condition which checks existence of nested struct fields from a set of values from a list
e.g below sample row should be removed using filter condition since it does not contain type1 and type2
{
"id":"B07H3MVTS1",
"mid":4,
"inner":{
"type3":[{
"cid":"B06XVVSLX3"
},
{
"cid":"B06XJ2JZ24"
}
]
}
}
Schema for the above data can be seen in below method getSchema()
private StructType getSchema(){
ArrayType types = DataTypes.createArrayType(DataTypes.createStructType(
new StructField[] {
DataTypes.createStructField("cid", DataTypes.StringType, false)
}));
StructType inner = DataTypes.createStructType(
new StructField[] {
DataTypes.createStructField("type1", types, true), DataTypes.createStructField("type2", types, true),
DataTypes.createStructField("type3", types, true),
DataTypes.createStructField("type4", types, true)
});
return DataTypes.createStructType(
new StructField[] {
DataTypes.createStructField("id", DataTypes.StringType, false),
DataTypes.createStructField("mid", DataTypes.IntegerType, false),
DataTypes.createStructField("inner", inner, false)
});
}
You can just filter
the list of fields with isNotNull
function as below, adjust the filter condition as per your need
val fields = List("type1", "type2")
val filter = fields.map(f => col(s"inner.$f").isNotNull).reduce(_ or _)
df.filter(filter).show(false)
Output:
+----------+------------------------------------------------------------------------------------------+---+
|id |inner |mid|
+----------+------------------------------------------------------------------------------------------+---+
|B07H3MVTSN|{[{B06XVVSLX8}, {B06XJ2JZ2Z}], [{B06XVVSLX1}, {B06XJ2JZ22}], [{B06XVVSLX3}, {B06XJ2JZ24}]}|4 |
+----------+------------------------------------------------------------------------------------------+---+