Search code examples
amazon-simpledbboto

Using SimpleDB NextToken when records in query are updated


I have a case where we are doing a select on a domain like:

select * from mydomain where some_val = 'foo' and some_date < '2012-03-01T00:00+01:00'

When iterating the results of this query - we are doing some work and then updating the row and setting the field some_date to the current date/time. Marking that it's processed.

The question I have is will the nexttoken request break when it returns to simpledb to get the next set of records? When it returns to get the next batch - all of the ones in the first batch will now have some_date with a value that no longer is within the original query range.

I don't know how the next-token is implemented to know whether its just a pointer to the next item or whether it somehow is an offset that might "skip" a whole batch of records.

So if we retrieved 3 records at a time and I had this in my domain:

record 1, '2012-01-12T19:20+01:00'
record 2, '2012-02-14T19:20+01:00'
record 3, '2012-01-22T19:20+01:00'
record 4, '2012-01-21T19:20+01:00'
record 5, '2012-02-22T19:20+01:00'
record 6, '2012-01-20T19:20+01:00'
record 7, '2012-01-18T19:20+01:00'
record 8, '2012-01-17T19:20+01:00'
record 9, '2012-02-12T19:20+01:00'

My first execution I would get: record 1, 2, 3 If i set their some_date field to: '2012-03-12T19:20+01:00' before returning for the next-token batch - would the next-token request then return 4,5,6? Or would it return 7,8,9 (because the token was set to start at the 4th record and now 1,2,3 are no longer in the result set).

If it is important - we are using the boto library (python).


Solution

  • would the next-token request then return 4,5,6? Or would it return 7,8,9 [...]?

    Good question, this can indeed be a bit confusing - still anything but the former (i.e. 4,5,6) wouldn't make sense for practical usage and Amazon SimpleDB works like so accordingly, see Select:

    Operations that run longer than 5 seconds return a time-out error response or a partial or empty result set. Partial and empty result sets contain a NextToken value, which allows you to continue the operation from where it left off [emphasis mine]

    Please take note of the additional note in section Request Parameters though, which might be a bit surprising eventually:

    Note

    The response to a Select operation with ConsistentRead set to true returns a consistent read. However, for any following Select operation requests that include a NextToken value, Amazon SimpleDB ignores the ConsistentRead field, and the subsequent results are eventually consistent. [emphasis mine]