Search code examples
azurepysparktypesdatabricksazure-databricks

How to Turn off scientific notations in values in Pyspark?


I have dataframe having one decimal column (23,8). In pyspark dataframe some value is 0 but it is being displayed as 0E-8. Due to this we are having issue after conversion to csv.

How to turn off scientific notation? I don't want 0E-8. It should be 0.00000000. Can anyone help me on this?

Tried casting. But not working.


Solution

  • You can import the below function

    from pyspark.sql.functions import format_number
    

    You can apply the function to your decimal column to format it as a string with the desired number of decimal places

    In this example, we use the format_number function to format the decimal column with 8 decimal places and create a new column called "formatted_column.

    from pyspark.sql.functions import format_number
    data = [(0.00000000,), (123.45678901,), (9876.54321098,)]
    columns = ["decimal_column"]
    df = spark.createDataFrame(data, columns)df = df.withColumn("formatted_column", format_number("decimal_column", 8))
    df.show()
    

    enter image description here

    • Format the decimal column with 8 decimal places

    This should prevent the scientific notation (e.g., 0E-8) and display the decimal values with 8 decimal places (e.g., 0.00000000) in the CSV output.