Search code examples
sqlmongodbamazon-s3aws-glueamazon-athena

How can I query at struct<$oid:string> in aws athena


I want to query data that is stored in MongoDB and exported out into a number of JSON files, stored in S3

I am using AWS Glue to read the files into Athena however the data type for the id on each table is imported as struct<$oid:string>

I have tried every variation of adding quotations around the fields with no luck. everything I try results in the error name expected at the position 7 of 'struct<$oid:string>' but '$' is found.

Is there any way I can read these tables in their current form or do I need to declare their type in Glue?


Solution

  • Glue Crawlers create schemas that match what they find, without considering if they will work with, for example Athena. In Athena you can't have a struct property with an initial $, but Glue doesn't take that into account – partly because maybe you will be using the table with something else where that is not a problem, and partly because what else can it do, that is the name of the property.

    There are two ways around it, but neither will work if you continue to use a crawler. You will need to modify the table schema, but if you continue to run the crawler it will just revert it back again.

    The first, and probably simplest option, is to change the type of the column to STRING and then use a JSON function at query time to extract the value using JSONPath ($ is a special character in JSONPath, but you should be able to escape it).

    The second option is to use the "mappings" feature of the Hive JSON serde. I'm not 100% sure if it will work for this case, but it could be worth a try. The docs are not very extensive on how to configure it, unfortunately.