Search code examples
sqldatetimeamazon-athenaprestotrino

Athena: Calculating Age from String "birth_dt" Column


I am working with a dataset that has a column titled "birth_dt" from which I am trying to calculate the age of the member as of 18 months ago from.

This is what "select distinct(birth_dt) from table.DB limit 9;" shows the column to look like:

birth_dt
-----------------------
1990-08-15 00:00:00.000
1986-05-30 00:00:00.000
1981-03-03 00:00:00.000
1970-05-11 00:00:00.000
1963-01-20 00:00:00.000
1985-01-26 00:00:00.000
1983-01-03 00:00:00.000
2021-01-03 00:00:00.000
1995-11-14 00:00:00.000

In Athena, this column shows as being a "string".

So far, what I have tried is:

select DATE_DIFF('year', birth_dt, (current_timestamp - interval '18' month)) as age from table.DB limit 10; 

which kept giving me an error saying:

"SYNTAX_ERROR: line 1:8: Unexpected parameters (varchar(4), varchar, timestamp with time zone) for function date_diff. Expected: date_diff(varchar(x), time, time) , date_diff(varchar(x), time with time zone, time with time zone) , date_diff(varchar(x), timestamp with time zone, timestamp with time zone) , date_diff(varchar(x), date, date) , date_diff(varchar(x), timestamp, timestamp)"

So then I tried to just focus on getting that "birth_dt" column to change to a valid date to no avail with the following:

select date_parse(birth_dt, '%YYYY-%MM-%dd') as age from table.DB limit 10;

select cast(from_iso8601_timestamp("birth_dt.1") AS date) as age from table.DB limit 10;

SELECT date_parse(birth_dt,'%Y-%m-%dT%H:%i:%s%+00:00') as date_column,
current_timestamp as Todays_Date,
date_diff('day',current_timestamp,date_parse(birth_dt,'%Y-%m-%dT%H:%i:%s%+00:00')) as difference
FROM table.DB limit 10;

These queries end up giving me errors that say either:

"INVALID_FUNCTION_ARGUMENT: Invalid format: "1990-08-15 00:00:00.000" is malformed at "-08-15 00:00:00.000""

or

"INVALID_FUNCTION_ARGUMENT: Invalid format: "1990-08-15 00:00:00.000" is malformed at "00:00:00.000""

What I want to eventually have for my examples above is:

Age
---
31
35
40
51
58
36
38
1
25

Please help! Thank you!!!


Solution

  • Providing valid format is crucial here. Some attempts where really close, try the following:

    -- sample data
    WITH dataset(birth_dt) AS (
        values ('1983-01-03 00:00:00.000'),
            ('2021-01-03 00:00:00.000')
    )
    
    -- query
    select date_diff(
            'year',
            date_parse(birth_dt, '%Y-%m-%d %T.%f'),
            current_timestamp - interval '18' month)
    from dataset;
    

    Output:

    _col0
    41
    3

    Check the MySQL Date Functions docs for all supported format specifiers and what they mean.