Search code examples
paginationamazon-dynamodbdynamodb-queriesamazon-dynamodb-index

is it correct to use a DynamoDB scan operation with pagination instead of query with GSI, I need all the items from the table


I read that dynamo db scan operation is slow when the data is large . but i want to know that, Having a scenario to extract all the items. Is it still preferred to avoid scan ? considering indexes are not free and i need all the items from table, i am going for this approach.

  1. Please suggest if their is any problem by choosing scan operation ?
  2. why only scan has parallel scan option, is query parallel by default ?
  3. if i use query operation with pagination will it run sequential or parallel?

Solution

  • If you need all items, then Scan() is perfectly fine.

    Just realize that DDB

    • only returns 1MB of data at a time, so you'll need to call in a loop using ExclusiveStartKey := LastEvaluatedKey
    • Scan() can quickly consume your provisioned RCU, so watch for throttle errors and retry.

    The recommendation against Scan() is trying to use Scan() + filter in place of Query() for a subset of records. Scan() always reads the full table.

    Also note that from a performance standpoint, Scan() supports parallel scans.

    TotalSegments
    For a parallel Scan request, TotalSegments represents the total number of segments into which the Scan operation will be divided. The value of TotalSegments corresponds to the number of application workers that will perform the parallel scan. For example, if you want to use four application threads to scan a table or an index, specify a TotalSegments value of 4.

    But again, if using provisioned reads...a parallel scan will eat up RCU quickly.