Search code examples
pysparkapache-spark-sql

Why spark sql skips milliseconds when we do casting


 I dont know why this spark sql skips the last 3 000s after that decimal point

    df = spark.sql("""
      select cast('1900-01-01 00:00:00.000' as timestamp) as a  from dbo.products
      """)

  df.show(1,False)

This above query produces 1900-01-01 00:00:00 It somehow misses the last 3 zero after decimals

Expected is

 1900-01-01 00:00:00.000

Can someone help please


Solution

  • It seems that when milliseconds are .000 spark by default skips that part:

    >>> spark.sql("SELECT CAST('1900-01-01 00:00:00.000' AS TIMESTAMP)").show(truncate=False)
    +------------------------------------------+
    |CAST(1900-01-01 00:00:00.000 AS TIMESTAMP)|
    +------------------------------------------+
    |1900-01-01 00:00:00                       |
    +------------------------------------------+
    

    If you set milliseconds to, for example, .999, it'll be displayed:

    >>> spark.sql("SELECT CAST('1900-01-01 00:00:00.999' AS TIMESTAMP)").show(truncate=False)
    +------------------------------------------+
    |CAST(1900-01-01 00:00:00.999 AS TIMESTAMP)|
    +------------------------------------------+
    |1900-01-01 00:00:00.999                   |
    +------------------------------------------+
    

    If you really need that part to be displayed, use date_format:

    >>> df = spark.sql("SELECT CAST('1900-01-01 00:00:00.000' AS TIMESTAMP) datecol")
    >>> df.select(F.date_format("datecol", "yyyy-MM-dd HH:mm:ss.SSS").alias("datecol")).show(truncate=False)
    +-----------------------+
    |datecol                |
    +-----------------------+
    |1900-01-01 00:00:00.000|
    +-----------------------+