I want to get the integer days difference between two datetimes in SparkSQL, but it ignores the time part and returns a different result than expected.
For example, the query below returns 9, but I expected 8:
SELECT DATEDIFF(CAST('2021-07-10 02:26:16' AS TIMESTAMP), CAST('2021-07-01 19:10:28' AS TIMESTAMP))
I can achieve the expected result by converting the datetime to long, so I can get the seconds difference between them and convert the result to days, casting to integer, like:
SELECT CAST((CAST(CAST('2021-07-10 02:26:16' AS TIMESTAMP) AS LONG) - CAST(CAST('2021-07-01 19:10:28' AS TIMESTAMP) AS LONG))/(60*60*24) AS INTEGER)
I wanted to know if there is a 'more recommended way' of doing that, like using some built-in SparkSQL function.
Thanks in advance.
I would recommend using the extract
SQL function and apply it to the interval (difference of two timestamps).
Extracts a part of the date/timestamp or interval source
*) extract
function is available in Spark from version 3.x on.
See example below
WITH input AS (
select TIMESTAMP'2021-07-10 02:26:16' t2,
TIMESTAMP'2021-07-01 19:10:28' t1
)
SELECT
datediff(t2, t1) `datediff`,
extract(day FROM t2-t1) `extract`
FROM input
returns
datediff | extract |
---|---|
9 | 8 |