I have an Array[Row] and I want to turn it into either a Dataset[Row]
or DataFrame
.
How did I come up with an Array of Rows?
Well, I was trying to clear nulls from my dataset:
.na.drop()
function from DataFrameNaFunctions
because it fails to detect when a cell actually has the string "null"
.So, I came up with the following line to filter out null
in all columns.
val outDF = inputDF.columns.flatMap { col => inputDF.filter(col + "!='' AND " + col + "!='null'").collect() }
Problem is, outDF is an Array[Row]
, hence the question! Any ideas welcome!
I'm posting the answer as per my comment.
df.na.drop(df.columns).where("'null' not in ("+df.columns.mkString(",")+")")