Search code examples
scalaapache-sparkapache-spark-sqlspace

unable to remove trailing space in spark scala dataframe


I have a DataFrame and i have applied different functions to replace trailing space but there is no luck

df.select(col("e_no"),regexp_replace(col("e_no"),"//s+$",""),rtrim(col("e_no")),length(col("e_no"))).show()

e_no   | regexp_replace(e_no),//s+$, )| rtrim(e_no)|  length(e_no)
525071 |          525071 |              525071 |      7
512938|           512938|               512938|       6
522783 |          522783 |              522783 |      7

please could you advise.


Solution

  • The rtrim function should work. As for the regex replace, the correct regular expression will be

    "\s+$"

    Working code using spark: 2.2.1

    import spark.implicits._
    import org.apache.spark.sql.functions._
    
    val list = Seq("525071 ", "512938", "522783 ")
    val df = list.toDF("e_no")
    
    df.select(
      col("e_no"),
      regexp_replace(col("e_no"), "\\s+$", ""),
      rtrim(col("e_no")),
      length(col("e_no"))
    ).show()