Search code examples
pythonamazon-web-servicesamazon-dynamodbboto3

Dynamodb batch delete based on sort key pattern


I have a DynamoDB table with the following structure:

sk             |  pk
---------------|-----
1#2023-12-01   |  abv
1#2023-12-02   |  abv
1#2023-12-03   |  abv
1#2023-12-04   |  abv
1#2023-12-05   |  abv
2#2023-12-01   |  abv
2#2023-12-02   |  abv
2#2023-12-03   |  abv
2#2023-12-04   |  abv
2#2023-12-05   |  abv
...
20#2023-12-11  |  abv
20#2023-12-12  |  abv
20#2023-12-12  |  abv

Now, I want to perform a batch delete operation on this table where pk = 'abv' and sk represents a dynamic integer between 1 and 30, followed by a literal # and then date part YYYY-MM-DD which have to less than the current date(assuming current date is 2023-12-12). Essentially, I want to remove all items where the date is less than 1-30#2023-12-12. So after the delete operation on the table, the final table should only contain items like:

sk             |  pk
---------------|-----
20#2023-12-12  |  abv
20#2023-12-12  |  abv

How can I achieve this in DynamoDB using a batch delete operation? Any guidance on constructing the batch delete request or any other optimized way to code it?. I am thinking this, but I am not a fan of the scan() operation of Dynamodb.

from datetime import datetime, timedelta
from typing import Dict, List
class Dynamodb:

    def batch_delete_old_data(self, pk: str):
        try:
            # Calculate the date to keep (e.g., today's date)
            date_to_keep = datetime.now().strftime('%Y-%m-%d')

            # Scan for all items with the specified pk
            response = self._table.scan(
                FilterExpression=Key('pk').eq(pk)
            )

            items_to_delete = [{'pk': item['pk'], 'sk': item['sk']} for item in response.get('Items', [])
                               if self.extract_date_part(item['sk']) < date_to_keep]

            with self._table.batch_writer() as batch:
                for item in items_to_delete:
                    batch.delete_item(Key=item)

            return {"message": "Old data cleanup successful"}

        except Exception as e:
            # Handle errors appropriately
            raise Exception(f"Error: {str(e)}")

    @staticmethod
    def extract_date_part(sk: str) -> str:
        # Extract the date part from the sk, assuming format "prefix#date"
        return sk.split('#')[-1] if '#' in sk else sk


Solution

  • It's not clear if you had an item with sort key 20#2023-12-02 would it also maintain to exist after the deletion? If so: You'll need to execute N Query requests to obtain the items you needs.

    If not: What you are trying to achieve is a range delete, which is not supported in DynamoDB.

    To achieve your use-case you must first Query to retrieve all of the items where SK between X and Y. So instead of Scan use Query which will be more optimal .

    With the results, loop through a BatchWrite deleting 25 items per batch. Or as you use batch writer, you can pass the entire result set and it'll chunk into 25 automatically.