I want to use Athena partition projection to write queries that filter on date partition columns. The issue is that I need the physical data format that will be projected (the S3 file prefixes) to differ from the date formats that users will query by.
For example, my physical data format in S3 stores prefixes as URL encoded dates: s3://bucket/table_root/landing_time=yyyy-MM-dd%20HH%3Amm%3Ass.S'
But I want users to be able to query Athena using readable dates (No URL encoded formatting):
select * from table_root where landing_time='2020-01-01 12:00:00.0'
I tried using the following table settings for partition projection:
projection.landing_time.range: NOW-2YEARS,NOW
projection.landing_time.type: date
projection.landing_time.interval: 1
projection.landing_time.interval.unit: HOURS
projection.landing_time.format: yyyy-MM-dd'%20'HH'%3A00%3A00.0'
projection.enabled: true
I see partition projection allows custom path derivations using table property: storage.location.template: s3://bucket/table_root/a=${a}/${b}/some_static_subdirectory/${c}/
But I still do not see any way to customize the projected format of date columns and allow it to differ from the format used to run queries. Is there any way to do this?
This can be solved by adding an additional partition column to handle hours and changed landing_time
to only handle the date part. After that, changing storage.location.template
to include the url encoded %20
and %3A
values required by URL encoding gave me the desired result.
projection.landing_time.range: 2020-01-01,NOW
projection.landing_time.type: date
projection.landing_time.interval: 1
projection.landing_time.interval.unit: DAYS
projection.landing_time.format: yyyy-MM-dd
projection.hours.type: integer
projection.hours.range: 0,23
projection.hours.digits: 2
projection.enabled: true
storage.location.template: s3://bucket/table_root/landing_time=${landing_time}%20${hours}%3A00%3A00.0
Athena queries using partition projection can now be written in the format:
select * from table_root where landing_time='2020-01-01' and hours=1;
select * from table_root where landing_time='2020-01-01' and hours>2 and hours<10;
And the correct format matching my S3 data prefixes will be projected to S3.