Search code examples
sqlamazon-web-servicesamazon-athenaprestotrino

Is there a way to cast string to date to int in athena?


I'm working on use case where i need to convert the string type to integer type in AWS Athena

Below is my input data -

year    month
2022    jan
2022    feb
2012    apr
2023    may
2019    jun

Below is what I'm expecting -

year    month   yearmon
2022    jan     202201
2022    feb     202202
2012    apr     201204
2023    may     202305
2019    dec     201912

Initially I thought to use case statement wherein I hardcode for each month, concatenate this with year column and finally parse the yearmon as int. Something like -

concat(year, case when month = 'jan' then '01'
...
... end) 

Finally cast it to int

I tried below formats -

Select month("jan")
Select cast("jan" as date)

But nothing seem to have worked.

Is there a better way to cast this use case than hard coding the values in case statement?


Solution

  • You can try using date_parse and date_format. The following works in Trino on which the Athena is based on:

    -- sample data
    with dataset(year, month) as (
        values (2022, 'jan'),
        (2022, 'feb'),
        (2012, 'apr'),
        (2023, 'may'),
        (2019, 'jun')
    )
    
    -- query
    select date_format(
                 date_parse(cast(year as varchar) || '-' || month, '%Y-%b'),
                 '%Y%m'
               )
    from dataset;
    

    Which produces the following output (TBH I was a bit surprised that it can handle lowercase month names):

    _col0
    202201
    202202
    201204
    202305
    201906

    If this will not work for your actual data then using case-when is the approach you might want to take.