Search code examples
pythonpysparkamazon-dynamodb

How do i write to dynamo from pyspark without the attributevalues?


I have a dynamicframe which has following schema

root
 |-- data1: string (nullable = false)
 |-- data2: string (nullable = false)
 |-- data3: array (nullable = false)
 |    |-- element: string (containsNull = true)

Now when i write this to dynamodb using the

glue_context.write_dynamic_frame_from_options(
        frame=DynamicFrame.fromDF(df, glue_context, "output"),
        connection_type="dynamodb",
        connection_options={
            "dynamodb.output.tableName": "table_name",
            "dynamodb.throughput.write.percent": "1.0",
        },
    )

The data three is being written as [ { "L" : [ { "S" : "" }, { "S" : "" }, { "S" : "" }, { "S" : "" } ] } ] but instead i want it as ["","","",""],

How do i achieve this?


Solution

  • DynamoDB always stores data in DynamoDB-JSON which includes the type descriptors that you refer to as AttributesValues.

    This blog highlights the difference in both.

    Depending on how/where you read the data from, you can use an unmarshall function to bring it back to native-JSON or use one of the high level SDK's as explained in the aforementioned blog. The web console also has a toggle switch so you can switch between both types of JSON when viewing in the console.