Search code examples
jsonamazon-web-servicesaws-glueunnest

AWS Glue - Can't select fields after unnest or relationalize


In AWS S3 I have json docs that I read-in with AWS Glue's create_dynamic_frame.from_options("s3" ...) and the DynamicFrame.printSchema() shows me this, which matches the schema of the documents:

root
|-- updatedAt: string
|-- json: struct
|    |-- rowId: int

Then I unnest() or relationalize() (have tried both) the DynamicFrame to a new dyF and then .printSchema() shows me this, which seems correctly unnested:

root
|-- updatedAt: string
|-- json.rowId: int

The problem is that I can't seem to use the nested fields.
dyF.select_fields(["updatedAt"]) will work and give me a dyF with the "updatedAt" field.
But
dyF.select_fields(["json.rowId"]) gives me an empty dyF.

What am I doing wrong?


Solution

  • The solution is to use backticks around the column name.

    Example: .select_fields(["journalId", "`json.rowId`"])