Search code examples
dataframeapache-sparkapache-spark-sqlregexp-replace

How to remove the space in a column of dataframe and add string "NA" to it?


I have a dataframe (df) as shown in below. But in the column sensortype I have one field blank. I want to replace it with the string "na". How to do it?

+----------+-------+
|sensortype|offline|
+----------+-------+
|Sensor1   |2      |
|Sensor1   |0      |
|          |2      |
+----------+-------+

I tried this, but it didn't work:

df.withColumn("sensortype",regexp_replace(col("sensortype"),"\\s+","NA"))

Solution

  • This should work:

    df.withColumn("sensortype", when(length(col("sensortype"))===0,"NA")
    .otherwise(col("sensortype")))