Search code examples
scalaapache-sparkhivebigdata

converting Scientific notation of a column to decimal value


I have following data in a column

value
1.873452634567E7
null
1.87345265634544467E9
1.8734526563456347E10
1.8734526563456723E8

i tried

df.withColumn("s", 'value.cast("Decimal(14,4)"))

but its not helping.


Solution

  • If you first cast to float and then Decimal you will get what you need.

    from pyspark.sql import SparkSession
    from pyspark.sql.functions import col
    
    # Create a Spark session
    spark = SparkSession.builder.appName("ScientificToDecimal").getOrCreate()
    
    # Sample data
    data = [("1.873452634567E7",),
            (None,),
            ("1.87345265634544467E9",),
            ("1.8734526563456347E10",),
            ("1.8734526563456723E8",)]
    
    # Define schema and create a DataFrame
    columns = ["value"]
    df = spark.createDataFrame(data, columns)
    
    # Convert to float first, then to decimal
    df = df.withColumn("value_float", col("value").cast("float"))
    df = df.withColumn("value_decimal", col("value_float").cast("Decimal(38,4)"))
    
    # Show results
    df.show()