If JSON file has field names which has spaces in it, how to read those JSON files to Athena without doing any pre processing?
If you want to read a JSONL (Line separated JSON) with Keys having space, best approach is
LazySimpleSerDe
Eg:
Source JSON file looks like :
{"ts": 20230127191413,"event Name": "App Launch","eventProps": {"CT Source": "Mobile" }}
{"ts": 2023012719567,"event Name": "App Launch","eventProps": {"CT Source": "Mobile" }}
Create table:
CREATE EXTERNAL TABLE IF NOT EXISTS tempdb.csv_json_parse(
data string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
LOCATION 's3://<bucket>/<prefix_to_file>/';
Now parse the data string to JSON and extract:
with jsd as (select
json_parse(data) as data
from tempdb.csv_json_parse)
select
json_extract(data, '$["ts"]') as ts ,
json_extract(data, '$["event Name"]') as eventname ,
json_extract(data, '$["eventProps"]["CT Source"]') as eventprops_ct_source
from jsd;