apache-spark pyspark types apache-spark-sql decimal

Create column of decimal type when creating a dataframe

I would like to provide numbers when creating a Spark dataframe. I have issues providing decimal type numbers.

This way the number gets truncated:

df = spark.createDataFrame([(10234567891023456789.5, )], ["numb"])
df = df.withColumn("numb_dec", F.col("numb").cast("decimal(30,1)"))
df.show(truncate=False)
#+---------------------+----------------------+
#|numb                 |numb_dec              |
#+---------------------+----------------------+
#|1.0234567891023456E19|10234567891023456000.0|
#+---------------------+----------------------+

This fails:

df = spark.createDataFrame([(10234567891023456789.5, )], "numb decimal(30,1)")
df.show(truncate=False)

TypeError: field numb: DecimalType(30,1) can not accept object 1.0234567891023456e+19 in type <class 'float'>

How to correctly provide big decimal numbers so that they wouldn't get truncated?

Solution

Maybe this is related to some differences in floating points representation between Python and Spark. You can try passing string values when creating dataframe instead:

df = spark.createDataFrame([("10234567891023456789.5", )], ["numb"])

df = df.withColumn("numb_dec", F.col("numb").cast("decimal(30,1)"))
df.show(truncate=False)
#+----------------------+----------------------+
#|numb                  |numb_dec              |
#+----------------------+----------------------+
#|10234567891023456789.5|10234567891023456789.5|
#+----------------------+----------------------+