Search code examples
c#.netamazon-web-servicesamazon-dynamodb

Append new object to a DynamoDB document child-node


I have a large JSON object that I am trying to push into a DynamoDB table. The size is larger than 400mb so I have to break it up.

I tried to follow this question here using node.js but it is not working as expected: Append a new object to a JSON Array in DynamoDB using NodeJS

I have also read the docs here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.ConditionExpressions.html

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.UpdateExpressions.html#Expressions.UpdateExpressions.Multiple

If the item was not so large I could just do AWS.Dynamodb.Table.PutItemAsync(document, client);

DynamoDB is the preferred solution to this issue, though I know static JSON in S3 would also work, using Lambda to update it (daily?)

The two routes I have tried either smash the #Data field into one long list of items (not a list of objects)

UpdateExpression = "SET #data = list_append(if_not_exists(#data, :empty_list), :item)"`
ExpressionAttributeValues = new Dictionary<string, AttributeValue>
{
   { ":item", new AttributeValue { L = dataDocument.Values.ToList() } }
}

Or

Fail citing that I cannot push item of typeAttributeValue.M using add

UpdateExpression = "add #data :item",
ExpressionAttributeValues = new Dictionary<string, AttributeValue>
{
    { ":item", new AttributeValue { M = dataDocument }},
}

Method

var table = Table.LoadTable(_dynamoClient, AnalyticsTableName);
var strippedDoc = new AnalyticsDataCluster
{
    PK = item.PK,
    SK = item.SK,
    Data = new List<AnalyticsData>()
};
await table.PutItemAsync(Document.FromJson(JsonSerializer.Serialize(strippedDoc)));

foreach (var data in item.Data)
{
    var dataDocument = Document.FromJson(JsonSerializer.Serialize(data)).ToAttributeMap();
    var request = new UpdateItemRequest
    {
        TableName = AnalyticsTableName,
        Key = new Dictionary<string, AttributeValue>
        {
            { "PK", new AttributeValue { S = item.PK } },
            { "SK", new AttributeValue { S = item.SK } }
        },
        UpdateExpression = "add #data :item",
        ExpressionAttributeNames = new Dictionary<string, string>
        {
            { "#data", "Data" }
        },
        ExpressionAttributeValues = new Dictionary<string, AttributeValue>
        {
            { ":item", new AttributeValue
                {
                    M = dataDocument
                }

            },
            { ":empty_list", new AttributeValue { IsLSet  = true}},
            { ":id", new AttributeValue{ S = data.Id}}
            
        },
    };
    var response = await _dynamoClient.UpdateItemAsync(request);
  }
}

Data object

    // Analytics data package
    public AnalyticsData(Dictionary<string, AttributeValue> attributeValues) : this()
    {
        Id = attributeValues.TryGetValue("Id", out var idValue) ? idValue.S : null;
        AccessDateTime = attributeValues.TryGetValue("AccessDateTime", out var loginDateTimeValue) ? DateTime.Parse(loginDateTimeValue.S) : DateTime.MinValue;
        RequestedUri = attributeValues.TryGetValue("RequestedUri", out var requestedUriValue) ? requestedUriValue.S : null;
        Person = new PersonBase
        {
            Id = Convert.ToInt32(attributeValues["Person"].M["Id"].N),
            FirstName = attributeValues["Person"].M.TryGetValue("FirstName", out var firstNameValue) ? firstNameValue.S : null,
            // truncated for clarity
        };
    }
    public string? Id { get; set; }
    public DateTime AccessDateTime { get; set; }
    public string? RequestedUri { get; set; }
    public PersonBase? Person { get; set; }
    // truncated for clarity
}
public class PersonBase
{
    public int Id { get; set; }
    public string? FirstName { get; set; }
    // etc
}

Error Message

Amazon.DynamoDBv2.AmazonDynamoDBException: 
'Invalid UpdateExpression: Incorrect operand type for operator or function; operator or function: list_append, operand type: M'


Solution

  • I'm not sure if you understand DynamoDB correctly. There is a 400KB limit per item, meaning you cannot push small chunks of data into the same item and call it a list of items, no matter what, you can't store more than 400KB per item.

    You need to implement vertical partitioning, where your large objects gets split into multiple items, uniquely identified by the sort key.

    https://aws.amazon.com/blogs/database/use-vertical-partitioning-to-scale-data-efficiently-in-amazon-dynamodb/

    Your data would be stored in DynamoDB something similar to the following:

    PK SK Data
    Item1 Part1 {data less than 400KB}
    Item1 Part2 {data less than 400KB}
    Item1 Part3 {data less than 400KB}
    Item2 Part1 {data less than 400KB}

    Obviously the PK and SK can be something more meaningful, where the PK relates to the object itself, and the SK relates to that individual part of the object.