Search code examples
datedatetimeapache-spark-sqldatediff

Get days difference between two datetimes in SparkSQL


I want to get the integer days difference between two datetimes in SparkSQL, but it ignores the time part and returns a different result than expected.

For example, the query below returns 9, but I expected 8:

SELECT DATEDIFF(CAST('2021-07-10 02:26:16' AS TIMESTAMP), CAST('2021-07-01 19:10:28' AS TIMESTAMP))

I can achieve the expected result by converting the datetime to long, so I can get the seconds difference between them and convert the result to days, casting to integer, like:

SELECT CAST((CAST(CAST('2021-07-10 02:26:16' AS TIMESTAMP) AS LONG) - CAST(CAST('2021-07-01 19:10:28' AS TIMESTAMP) AS LONG))/(60*60*24) AS INTEGER)

I wanted to know if there is a 'more recommended way' of doing that, like using some built-in SparkSQL function.

Thanks in advance.


Solution

  • I would recommend using the extract SQL function and apply it to the interval (difference of two timestamps).

    Extracts a part of the date/timestamp or interval source

    *) extract function is available in Spark from version 3.x on.

    See example below

    WITH input AS (
        select TIMESTAMP'2021-07-10 02:26:16' t2,
               TIMESTAMP'2021-07-01 19:10:28' t1
    )
    SELECT
        datediff(t2, t1) `datediff`,
        extract(day FROM t2-t1) `extract`
    FROM input
    

    returns

    datediff extract
    9 8