Search code examples
jsonamazon-web-servicesamazon-s3amazon-athena

Loading a json files with spaces in filed names to Athena Table


If JSON file has field names which has spaces in it, how to read those JSON files to Athena without doing any pre processing?


Solution

  • If you want to read a JSONL (Line separated JSON) with Keys having space, best approach is

    1. Create a table to read JSON as string using LazySimpleSerDe
    2. Parse the data as json and extract columns.

    Eg:

    Source JSON file looks like :

    {"ts": 20230127191413,"event Name": "App Launch","eventProps": {"CT Source": "Mobile" }}
    {"ts": 2023012719567,"event Name": "App Launch","eventProps": {"CT Source": "Mobile" }}
    

    Create table:

    CREATE EXTERNAL TABLE IF NOT EXISTS tempdb.csv_json_parse(
    data string
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
    LOCATION 's3://<bucket>/<prefix_to_file>/';
    

    Now parse the data string to JSON and extract:

    with jsd as (select 
    json_parse(data) as data
    from tempdb.csv_json_parse)
    select 
    json_extract(data, '$["ts"]') as ts ,
    json_extract(data, '$["event Name"]') as eventname ,
    json_extract(data, '$["eventProps"]["CT Source"]') as eventprops_ct_source 
    from jsd;