Search code examples
amazon-web-servicesamazon-athenaaws-glueaws-pinpoint

HIVE_INVALID_METADATA in Amazon Athena


How can I work around the following error in Amazon Athena?

HIVE_INVALID_METADATA: com.facebook.presto.hive.DataCatalogException: Error: : expected at the position 8 of 'struct<x-amz-request-id:string,action:string,label:string,category:string,when:string>' but '-' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null)

When looking at position 8 in the database table connected to Athena generated by AWS Glue, I can see that it has a column named attributes with a corresponding struct data type:

struct <
    x-amz-request-id:string,
    action:string,
    label:string,
    category:string,
    when:string
>

My guess is that the error occurs because the attributes field is not always populated (c.f. the _session.start event below) and does not always contain all fields (e.g. the DocumentHandling event below does not contain the attributes.x-amz-request-id field). What is the appropriate way to address this problem? Can I make a column optional in Glue? Can (should?) Glue fill the struct with empty strings? Other options?


Background: I have the following backend structure:

  • Amazon PinPoint Analytics collects metrics from my application.
  • The PinPoint event stream has been configured to forward the events to an Amazon Kinesis Firehose delivery stream.
  • Kinesis Firehose writes data to S3
  • Use AWS Glue to crawl S3
  • Use Athena to write queries based on the databases and tables generated by AWS Glue

I can see PinPoint events successfully being added to json files in S3, e.g.

First event in a file:

{
    "event_type": "_session.start",
    "event_timestamp": 1524835188519,
    "arrival_timestamp": 1524835192884,
    "event_version": "3.1",
    "application": {
        "app_id": "[an app id]",
        "cognito_identity_pool_id": "[a pool id]",
        "sdk": {
            "name": "Mozilla",
            "version": "5.0"
        }
    },
    "client": {
        "client_id": "[a client id]",
        "cognito_id": "[a cognito id]"
    },
    "device": {
        "locale": {
            "code": "en_GB",
            "country": "GB",
            "language": "en"
        },
        "make": "generic web browser",
        "model": "Unknown",
        "platform": {
            "name": "macos",
            "version": "10.12.6"
        }
    },
    "session": {
        "session_id": "[a session id]",
        "start_timestamp": 1524835188519
    },
    "attributes": {},
    "client_context": {
        "custom": {
            "legacy_identifier": "50ebf77917c74f9590c0c0abbe5522d2"
        }
    },
    "awsAccountId": "672057540201"
}

Second event in the same file:

{
    "event_type": "DocumentHandling",
    "event_timestamp": 1524835194932,
    "arrival_timestamp": 1524835200692,
    "event_version": "3.1",
    "application": {
        "app_id": "[an app id]",
        "cognito_identity_pool_id": "[a pool id]",
        "sdk": {
            "name": "Mozilla",
            "version": "5.0"
        }
    },
    "client": {
        "client_id": "[a client id]",
        "cognito_id": "[a cognito id]"
    },
    "device": {
        "locale": {
            "code": "en_GB",
            "country": "GB",
            "language": "en"
        },
        "make": "generic web browser",
        "model": "Unknown",
        "platform": {
            "name": "macos",
            "version": "10.12.6"
        }
    },
    "session": {},
    "attributes": {
        "action": "Button-click",
        "label": "FavoriteStar",
        "category": "Navigation"
    },
    "metrics": {
        "details": 40.0
    },
    "client_context": {
        "custom": {
            "legacy_identifier": "50ebf77917c74f9590c0c0abbe5522d2"
        }
    },
    "awsAccountId": "[aws account id]"
}

Next, AWS Glue has generated a database and a table. Specifically, I see that there is a column named attributes that has the value of

struct <
    x-amz-request-id:string,
    action:string,
    label:string,
    category:string,
    when:string
>

However, when I attempt to Preview table from Athena, i.e. execute the query

SELECT * FROM "pinpoint-test"."pinpoint_testfirehose" limit 10;

I get the error message described earlier.

Side note, I have tried to remove the attributes field (by editing the database table from Glue), but that results in Internal error when executing the SQL query from Athena.


Solution

  • This is a known limitation. Athena table and database names allow only underscore special characters#

    Athena table and database names cannot contain special characters, other than underscore (_). Source: http://docs.aws.amazon.com/athena/latest/ug/known-limitations.html