In AWS S3 I have json docs that I read-in with AWS Glue's create_dynamic_frame.from_options("s3" ...)
and the DynamicFrame.printSchema() shows me this, which matches the schema of the documents:
root
|-- updatedAt: string
|-- json: struct
| |-- rowId: int
Then I unnest()
or relationalize()
(have tried both) the DynamicFrame to a new dyF and then .printSchema()
shows me this, which seems correctly unnested:
root
|-- updatedAt: string
|-- json.rowId: int
The problem is that I can't seem to use the nested fields.
dyF.select_fields(["updatedAt"])
will work and give me a dyF with the "updatedAt" field.
But
dyF.select_fields(["json.rowId"])
gives me an empty dyF.
What am I doing wrong?
The solution is to use backticks around the column name.
Example: .select_fields(["journalId", "`json.rowId`"])