Search code examples
google-app-enginegoogle-cloud-platformgoogle-cloud-firestoregoogle-cloud-datastoresharding

Datastore in Firestore mode - a distributed counter than can scale it's shards up based on traffic


In Datastore in Firestore mode the recommended way to deal with storing a high write counter (such as profile views on a website) is to use sharded/distributed counters.

The problem I have is that with distributed counters you need to pick how many shards you want to have. This is addressed here as well. For example some profiles may get a lot more views per second than others (one profile may be a famous person while another is a regular person), and therefore need more shards.

Is there a way to write a distributed counter that can scale it's shards up if the page is getting a lot of views per second?

I was thinking of detecting a datastore contention error and then adding more shards if that happens.

I noticed there is a new extension for Cloud Firestore that seems to do what I am asking for. However, I am not using Cloud Firestore, I am using Datastore in Firestore mode - similar under the hood but still different.


Solution

  • The original Datastore distributed counters example:

    NUM_SHARDS = 20
    
    class SimpleCounterShard(ndb.Model):
        """Shards for the counter"""
        count = ndb.IntegerProperty(default=0)
    
    
    def get_count():
        """Retrieve the value for a given sharded counter.
    
        Returns:
            Integer; the cumulative count of all sharded counters.
        """
        total = 0
        for counter in SimpleCounterShard.query():
            total += counter.count
        return total
    
    
    @ndb.transactional
    def increment():
        """Increment the value for a given sharded counter."""
        shard_string_index = str(random.randint(0, NUM_SHARDS - 1))
        counter = SimpleCounterShard.get_by_id(shard_string_index)
        if counter is None:
            counter = SimpleCounterShard(id=shard_string_index)
        counter.count += 1
        counter.put()
    

    Used a fixed number of shards, but the Firestore example uses a separate entity for keeping track of the number of shards. So, you can update the code above with something like:

    class RootCounter(ndb.Model):
      count = ndb.IntegerProperty(default=0)
      num_shards = ndb.IntegerProperty(default=0)
    
      def get_count(self):
        if self.num_shards > 0:
          return sum([e.count for e in SimpleCounterShard.query(parent=self.key)])
    
        return count
    
      def increment(self):
        try:
          self._increment()
        except:
          self.num_shards += 1
          self.increment()
          self.put()
    
      @ndb.transactional(retries=1):
      def _increment(self):
        if self.num_shards > 0:
          SimpleCounterShard.increment(parent=self.key, self.num_shards)
        else:
          self.count += 1
          self.put()
    

    The important difference since Firestore in Datastore mode has been released is that Firestore in Datastore mode is strongly consistent and that you are likely not using entity groups. Thus a query will give an exact answer, and the sharded counters can nicely fit in the hierarchy with the root counter.