Drop records from dataframe based on the null present for any column

How to select only the rows with IDs for which there are no rows which contain nulls.

  +---------+------+-----+
  |AccountID|  Name|Price|
  +---------+------+-----+
  |       11|miguel| null|
  |       11|luisa |   21|
  |       12|  hary|   90|
  |       13|  null|   99|
  |       14|  marg|   90|
  |       14|  null|   99|
  +---------+------+-----+

  DF.na.drop().show()

  +---------+------+-----+
  |AccountID|  Name|Price|
  +---------+------+-----+
  |       11|luisa |   21|
  |       12|  hary|   90|
  |       14|  marg|   90|
  +---------+------+-----+

na.drop() removed all rows that have null values on any column of DataFrame. I would want to know how can I drop other record having AccountID 11. Delete all the records for same AccountID if null was present in any column/records for that particular AccountID

In this case output should be only AccountID 12.

As 11 and 14 AccountID's were having null values.

  +---------+------+-----+
  |AccountID|  Name|Price|
  +---------+------+-----+
  |       12|  hary|   90|
  +---------+------+-----+

Solution

To paraphrase your question, you want to select only the rows with IDs for which there are no rows which contain nulls.

// select rows which have a null
val idsWithNulls = DF.filter(col("Name").isNull || col("Price").isNull)
// based on those ids remove 'bad' ids
DF.join(idsWithNulls, Seq("AccountId"), "left_anti")

The result will be:

+---------+----+-----+
|AccountId|Name|Price|
+---------+----+-----+
|       12|hary|   90|
+---------+----+-----+