How to select only the rows with IDs for which there are no rows which contain nulls.
+---------+------+-----+
|AccountID| Name|Price|
+---------+------+-----+
| 11|miguel| null|
| 11|luisa | 21|
| 12| hary| 90|
| 13| null| 99|
| 14| marg| 90|
| 14| null| 99|
+---------+------+-----+
DF.na.drop().show()
+---------+------+-----+
|AccountID| Name|Price|
+---------+------+-----+
| 11|luisa | 21|
| 12| hary| 90|
| 14| marg| 90|
+---------+------+-----+
na.drop() removed all rows that have null values on any column of DataFrame. I would want to know how can I drop other record having AccountID 11. Delete all the records for same AccountID if null was present in any column/records for that particular AccountID
In this case output should be only AccountID 12.
As 11 and 14 AccountID's were having null values.
+---------+------+-----+
|AccountID| Name|Price|
+---------+------+-----+
| 12| hary| 90|
+---------+------+-----+
To paraphrase your question, you want to select only the rows with IDs for which there are no rows which contain nulls.
// select rows which have a null
val idsWithNulls = DF.filter(col("Name").isNull || col("Price").isNull)
// based on those ids remove 'bad' ids
DF.join(idsWithNulls, Seq("AccountId"), "left_anti")
The result will be:
+---------+----+-----+
|AccountId|Name|Price|
+---------+----+-----+
| 12|hary| 90|
+---------+----+-----+