firebase firebase-realtime-database sharding

How to shard data Realtime Database for chat app?

I am building a chat app and want to use RealTime Database. I expect my database to reach the quota 200k simultaneous connection.

So i have read the documentation about scaling and sharding the data.

However i don't understand how to handle this for a chat app. Let's say i have a groups reference that contains ids of users inside each group, and messages for this group.

If i want to scale, i need to create a new DB instance and start writing groups there too as the first DB may have more than 200k simultaneous connection.

That means users may belong to groups in multiple databases, which seems already weird and not such a good idea.

So i would like to know :

How can i shard the groups reference ?
How can i (or even should i) make users connect to multiple DB according to the groups they belong ?

It seems to be a very complicated way to do things... Am i not understanding this correctly ?

Solution

I'm sure there are plenty of ways to shard a database but here's how I've done it. This involves selecting a shard while creating a new chat. For this answer, let's assume there are 4 users: U1, U2, U3 and U4, and 2 shards (excluding the default): shard1 and shard2.

Whenever a user creates a new chat, select a shard and create a new node for that chat. You should store list of user's chats somewhere else along with the shard ID and the default database instance seems to be great for that but Firestore works too. So an object containing information of a chat will look something like:

{
  chatID: "c40f15af19a94b6f84117747337b9f7a",
  createdBy: "U1",
  users: ["U1", "U2", "U3"],
  shardId: "shard2"
}

Now you have list of chatIDs along with their shards so just connect your listeners. Again it depends on what the expected behavior is. In my case I just had to listen to data which is selected by user (i.e. active chat).

Try to divide chats evenly across all shards. One with least amount of chats active (you will have to store number of chats created per shard somewhere else like default shard) (or something like Round Robin maybe useful. At the same time, take the user creating the chat into account.

Incrementing count of chats present in a shard when a new chat is created maybe a good way.

At the end I think it's just about how you are dividing your chats in shards and there are many algorithms you can use. Having a list of user's chats containing the shard name seems to be an easy way to do so as above. I personally prefer Firestore to store list of chats so it's easier to query based on creator of chat, chats where a user U2 is a part and so on.

Creating new chats using a Cloud Function (or your servers) is preferred so no one can just spam a single database shard by reverse engineering the app.

This way all your messages will be stored in realtime database but basic information will of the chats is in Firestore (not necessary but easier to query chats). When a user opens the chat app, load the chats they are part of:

Here's a sample Firestore document:

const db = firebase.firestore()
// loading user's chats
const chatsSnapshot = await db.collection("chats").where("members", "array-contains", "myUID").get()

const chatsInfo = chats.map((c) => ({...c.data(), id: c.id}))


// Realtime DB shards
const shards = {
  shard1: firebase.database(app1),
  shard2: firebase.database(app2),
  shard3: firebase.database(app3)
}

// Run a loop on chatsInfo and render chats to your app
for (const chat of chatsInfo) {
  // Limit to first N messages if necessary
  const chatRef = shards[chat.shardId].ref(chat.id);
  chatRef.on('value', (snapshot) => {
    const data = snapshot.val();
    // Render messages
  });
}

You don't need to load all the chats as I've shown above. Load messages only for the chat that is active.