Search code examples
amazon-web-serviceshiveqlddlamazon-athena

Athena Create External Table ParseException


Hi I am trying to run the following command in Athena

CREATE EXTERNAL TABLE transport_evaluator_prod(
    messageId STRING,
    type STRING,
    causationId STRING,
    correlationId STRING,
    traceparent STRING,
    `data` struct < 
    evaluationOccurred: STRING,
    eta struct < distance: INT,
    timeToDestination: INT,
    eta: STRING,
    destination struct < latitude: DOUBLE,
    longitude: DOUBLE,
    altitude: DOUBLE >,
    destinationEventId: STRING,
    origin struct < latitude: DOUBLE,
    longitude: DOUBLE,
    altitude: DOUBLE >, originEventId: STRING,
    plannedArrival: STRING,
    locationActionReference: STRING,
    resourceUrn: STRING,
    eventProvider: STRING,
    occured: STRING,
    position struct < latitude: DOUBLE,
    longitude: DOUBLE,
    altitude: DOUBLE >, 
    equipmentNumber: STRING,
    received: STRING > >
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
    'serialization.format' = '1',
    'ignore.malformed.json' = 'true'
)
LOCATION 'changed-for-security'
TBLPROPERTIES ('has_encrypted_data' = 'false')

Unfortunately when I try to run this I get the following error : FAILED: ParseException line 1:189 missing : at 'struct' near '' line 1:262 missing : at 'struct' near '' line 1:363 missing : at 'struct' near '' line 1:579 missing : at 'struct' near ''

Can someone please help?


Solution

  • In reading Create Tables in Amazon Athena from Nested JSON and Mappings Using JSONSerDe | AWS Big Data Blog, I notice that any fields inside a STRUCT should be referenced as field_name:type.

    This also applies to struct.

    Therefore, this type of line (which is inside a struct):

    destination struct < latitude: DOUBLE,
    

    should be:

    destination:struct < latitude: DOUBLE,
    

    Thus, this seems to work:

    CREATE EXTERNAL TABLE transport_evaluator_prod(
        messageId STRING,
        type STRING,
        causationId STRING,
        correlationId STRING,
        traceparent STRING,
        `data` struct < 
            evaluationOccurred: STRING,
            eta:struct < distance: INT,
                timeToDestination: INT,
                eta: STRING,
                destination:struct < latitude: DOUBLE,
                    longitude: DOUBLE,
                    altitude: DOUBLE >,
                destinationEventId: STRING,
                origin:struct < latitude: DOUBLE,
                    longitude: DOUBLE,
                    altitude: DOUBLE >,
                originEventId: STRING,
                plannedArrival: STRING,
                locationActionReference: STRING,
                resourceUrn: STRING,
                eventProvider: STRING,
                occured: STRING,
                position:struct < latitude: DOUBLE,
                    longitude: DOUBLE,
                    altitude: DOUBLE >, 
                equipmentNumber: STRING,
                received: STRING > >
    )
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    WITH SERDEPROPERTIES (
        'serialization.format' = '1',
        'ignore.malformed.json' = 'true'
    )
    LOCATION 'changed-for-security'
    TBLPROPERTIES ('has_encrypted_data' = 'false')