I use a serverless file + CloudFormation to create a table on the AWS Athena service.
My serverless.yml:
...
CardBulkWorkgroup:
Type: AWS::Athena::WorkGroup
Properties:
Name: ${opt:stage}-${opt:client}-CardBulk
WorkGroupConfiguration:
ResultConfiguration:
OutputLocation: s3://${lower:${opt:stage}}-${lower:${opt:client}}-card-bulk-athena-result
CardBulkDatabase:
Type: AWS::Glue::Database
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseInput:
Name: ${lower:${opt:stage}}_${lower:${opt:client}}_bulkcard
CardBulkTable:
Type: AWS::Glue::Table
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseName: !Ref CardBulkDatabase
TableInput:
Name: card
StorageDescriptor:
Columns:
- Name: cardId
Type: int
- Name: metadata
Type: struct<orderId:string, convertVirtualToPhysicalErrors:string>
- Name: orderId
Type: string
- Name: errors
Type: string
Location: s3://${lower:${opt:stage}}_${lower:${opt:client}}-files/cards
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
SerdeInfo:
SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
Parameters:
"serialization.format": "1"
CardBulkAthenaBucketResult:
Type: AWS::S3::Bucket
Properties:
BucketName: ${lower:${opt:stage}}-${lower:${opt:client}}-card-bulk-athena-result
...
When I deploy my stack, the database dev_connect_bulkcard and my table card are created correctly.
The issue:
With my API, when I want to retrieve data from card table from my dev_connect_bulkcard database, I get this error.
"HIVE_METASTORE_ERROR: Error: name expected at the position 22 of 'struct<orderId:string, convertVirtualToPhysicalErrors:string>' but ' ' is found.
But, if I delete the card table directly from the AWS console (from Athena Service) and, generate it with this query:
CREATE EXTERNAL TABLE `card`(
`cardid` int COMMENT 'from deserializer',
`orderid` string COMMENT 'from deserializer',
`metadata` struct<orderid:string,convertvirtualtophysicalerrors:string> COMMENT 'from deserializer',
`errors` array<string> COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION
's3://my-bucket-files/cards'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1627378097')
It's work, I can return the data from the card table with my API.
Do you have any idea why I have to delete my table manually and then recreate it to get the results back please?
Thanks in advance,
EDIT: Solution by Prabhakar Reddy
:
The error is due to the space present in the column metadata. Remove the space between orderId and convertVirtualToPhysicalErrors . Athena will not accept space as a special character in it's column name. Refer this for more information.
The error is due to the space present in the column metadata
. Remove the space between orderId
and convertVirtualToPhysicalErrors
. Athena will not accept space as a special character in it's column name. Refer this for more information.