apache-spark apache-spark-sql amazon-dynamodb aws-glue

Issues while fetching attribute names in Array Struct field during ingesting the data from DynamDB to Hive table in S3

I am trying to ingest data from DynamoDB table to Hive table built over S3 using AWS Glue.

The data from Source would look like below -

{
 "id": "123",
 "name": "Peter",
 "title": "My List",
 "listItems": [
  {
   "manufacture_date": "2023-01-01",
   "purchase_price": "20.0"
  },
  {
   "manufacture_date": "2023-01-02",
   "purchase_price": "30.0"
  }
 ]
}

Code to ingest the data:

datasource = glue_context.create_dynamic_frame.from_options(
                            connection_type="dynamodb",
                            connection_options={"dynamodb.input.tableName": "purchase_list"},
                            transformation_ctx="datasource")
ddb_src_df = datasource.toDF()
ddb_src_df.createOrReplaceTempView("ddb_src_df")


inst_sql = f"""insert overwrite table purchase_list
        select id,
name,
title,
cast(listItems as string) as listItems
from ddb_src_df"""

glue_context.spark_session.sql(inst_sql)

I would like to have the Attribute names populated on the table so that I can unnest later from the table. But the above code is returning me without Attribute names like below -

id	name	title	listItems
123	Peter	My List	[{2023-01-01, 20.0}, {2023-01-02,30.0}]

How can I achieve something like this below?

Expected output:

id	name	title	listItems
123	Peter	My List	[{manufacture_date:2023-01-01,purchase_price:20.0}, {manufacture_date:2023-01-02,purchase_price:30.0}]

Target table strucuture:

id string
name string
title string
listItems string

I wanted to have the struct column listItems as String as the Attribute has potential to add more attributes in the future. Please advice.

Solution

You're almost there. In your insert, replace cast(listItems as string) with to_json(listItems) and you'll get strings "json-style". You can then remove double-quotes around attribute names and values using regexp_replace or something, if needed.