Search code examples
stringscalaapache-spark

remove leading zeroes from a string / varchar in spark scala


I have a variable (23 bytes varchar) that has values with leading zeroes (ex: 0000000036754678). How to remove the leading zeroes from this variable.


Solution

  • You can use regexp_replace built-in spark function as follow, provided that you already loaded your data in a dataframe called dataframe and that the column containing your values is called value:

    import org.apache.spark.sql.functions.{col, regexp_replace}
    
    dataframe.withColumn("value", regexp_replace(col("value"), "^0*", ""))
    

    If you have the following dataframe as input:

    +----------------+
    |value           |
    +----------------+
    |0000000036754678|
    +----------------+
    

    you will get the following result:

    +--------+
    |value   |
    +--------+
    |36754678|
    +--------+