Search code examples
amazon-web-servicesaws-cloudformationserverlessamazon-athena

Amazon Athena : HIVE_METASTORE_ERROR: name expected at the position 22 of [...] but ' ' is found


I use a serverless file + CloudFormation to create a table on the AWS Athena service.

My serverless.yml:

...
CardBulkWorkgroup:
  Type: AWS::Athena::WorkGroup
  Properties:
    Name: ${opt:stage}-${opt:client}-CardBulk
    WorkGroupConfiguration:
      ResultConfiguration:
        OutputLocation: s3://${lower:${opt:stage}}-${lower:${opt:client}}-card-bulk-athena-result

CardBulkDatabase:
  Type: AWS::Glue::Database
  Properties:
    CatalogId: !Ref AWS::AccountId
    DatabaseInput:
      Name: ${lower:${opt:stage}}_${lower:${opt:client}}_bulkcard

CardBulkTable:
  Type: AWS::Glue::Table
  Properties:
    CatalogId: !Ref AWS::AccountId
    DatabaseName: !Ref CardBulkDatabase
    TableInput:
      Name: card
      StorageDescriptor:
        Columns:
          - Name: cardId
            Type: int
          - Name: metadata
            Type: struct<orderId:string, convertVirtualToPhysicalErrors:string>
          - Name: orderId
            Type: string
          - Name: errors
            Type: string
        Location: s3://${lower:${opt:stage}}_${lower:${opt:client}}-files/cards
        InputFormat: org.apache.hadoop.mapred.TextInputFormat
        OutputFormat: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
        SerdeInfo:
          SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
          Parameters:
            "serialization.format": "1"

CardBulkAthenaBucketResult:
  Type: AWS::S3::Bucket
  Properties:
    BucketName: ${lower:${opt:stage}}-${lower:${opt:client}}-card-bulk-athena-result
...

When I deploy my stack, the database dev_connect_bulkcard and my table card are created correctly.

The issue:

With my API, when I want to retrieve data from card table from my dev_connect_bulkcard database, I get this error.

"HIVE_METASTORE_ERROR: Error: name expected at the position 22 of 'struct<orderId:string, convertVirtualToPhysicalErrors:string>' but ' ' is found.

But, if I delete the card table directly from the AWS console (from Athena Service) and, generate it with this query:

CREATE EXTERNAL TABLE `card`(
  `cardid` int COMMENT 'from deserializer', 
  `orderid` string COMMENT 'from deserializer', 
  `metadata` struct<orderid:string,convertvirtualtophysicalerrors:string> COMMENT 'from deserializer', 
  `errors` array<string> COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION
  's3://my-bucket-files/cards'
TBLPROPERTIES (
  'has_encrypted_data'='false', 
  'transient_lastDdlTime'='1627378097')

It's work, I can return the data from the card table with my API.

Do you have any idea why I have to delete my table manually and then recreate it to get the results back please?

Thanks in advance,

EDIT: Solution by Prabhakar Reddy:

The error is due to the space present in the column metadata. Remove the space between orderId and convertVirtualToPhysicalErrors . Athena will not accept space as a special character in it's column name. Refer this for more information.


Solution

  • The error is due to the space present in the column metadata. Remove the space between orderId and convertVirtualToPhysicalErrors . Athena will not accept space as a special character in it's column name. Refer this for more information.