Search code examples
apache-sparkpysparkapache-spark-sqldecimaltrim

How to trim zeros after decimal value without changing the datatype using PySpark?


My requirement is to remove trailing zeros from decimal value, I have tried regex and strip() to remove trailing zeros it worked but we use regex and strip for string datatype, I want Col_2 to be decimal without changing the precision and scale. Can someone please suggest alternative.

Sample Table :

Col_1 : string

Col_2 : decimal(18,2)

Col_1 Col_2
ABC 2.00
DEF 2.50
XMN 0.00
RST 1.28

Actual Result using regex:

Col_1 : string

Col_2 : string

Col_1 Col_2
ABC 2
DEF 2.5
XMN
RST 1.28

Expected Result :

Col_1 Col_2
ABC 2
DEF 2.5
XMN 0
RST 1.28

Solution

  • decimal(18,2) type will always store those 2 digits after the comma. Displaying the trailing zeros on the right side of the comma is just a matter of formatting. So, if you want Col_2 to be in decimal and preserve the precision then store it as decimal(18,2) and format it as you want when displaying the data.

    You can use for that format_number function:

    import pyspark.sql.functions as F
    
    df.withColumn("Col_2", F.expr("format_number(Col_2, '0.##')")).show()
    
    #+-----+-----+
    #|Col_1|Col_2|
    #+-----+-----+
    #|  ABC|    2|
    #|  DEF|  2.5|
    #|  XMN|    0|
    #|  RST| 1.28|
    #+-----+-----+