Search code examples
pythongoogle-app-enginetransactionsgoogle-cloud-datastoreconsistency

AppEngine: Maintaining DataStore Consistency When Creating Records


I've hit a small dilemma! I have a handler called vote; when it is invoked it sets a user's vote to whatever they have picked. To remember what options they previously picked, I store a VoteRecord options which details what their current vote is set to.

Of course, the first time they vote, I have to create the object and store it. But successive votes should just change the value of the existing VoteRecord. But he comes the problem: under some circumstances two VoteRecords can be created. It's rare (only happened once in all 500 votes we've seen so far) but still bad when it does.

The issue happens because two separate handlers both do essentially this:

query = VoteRecord.all().filter('user =', session.user).filter('poll =', poll)

if query.count(1) > 0:
 vote = query[0]

 poll.votes[vote.option] -= 1
 poll.votes[option] += 1
 poll.put()

 vote.option = option
 vote.updated = datetime.now()
 vote.put()
else:
 vote = VoteRecord()
 vote.user = session.user
 vote.poll = poll
 vote.option = option
 vote.put()

 poll.votes[option] += 1
 poll.put()

 session.user.votes += 1
 session.user.xp += 3
 session.user.put()

 incr('votes')

My question is: what is the most effective and fastest way to handle these requests while ensuring that no request is lost and no request creates two VoteRecord objects?


Solution

  • The issue is this part:

    if vote.count(1) == 0:
        obj = VoteRecord()
        obj.user = user
        obj.option = option
        obj.put()
    

    Without a transaction, your code could run in this order in two interpreter instances:

    if vote.count(1) == 0:
        obj = VoteRecord()
        obj.user = user
    
    
    if vote.count(1) == 0:
        obj = VoteRecord()
        obj.user = user
        obj.option = option
        obj.put()
    
    
        obj.option = option
        obj.put()
    

    Or any weird combination thereof. The problem is the count test runs again before the put has occured, so the second thread goes through the first part of the conditional instead of the second.

    You can fix this by putting the code in a function and then using

    db.run_in_transaction()
    

    to run the function.

    Problem is you seem to be relying on the count of objects returned by a query for your decision logic that needs to be put in the transaction. If you read the Google I/O talks or look at the group you'll see that this is not recommended. That's because you can't transactionalize a query. Instead, you should store the count as an entity value somewhere, query for it outside of the transaction function, and then pass the key for that entity to your transaction function.

    Here's an example of a transaction function that checks an entity property. It's passed the key as a parameter:

    def checkAndLockPage(pageKey):
      page = db.get(pageKey)
      if page.locked:
        return False
      else:
        page.locked = True
        page.put()
        return True
    

    Only one user at a time can lock this entity, and there will never be any duplicate locks.