Search code examples
node.jsfirebasegoogle-cloud-firestoregoogle-cloud-dataflowgoogle-cloud-pubsub

How can I efficiently insert more than 1 million records into Firestore?


Description:

I am working on a project where I need to insert more than 1 million records into Google Firestore. Currently, my approach is not efficient enough and the process is extremely slow. I am looking for a way to optimize this process.

What I've tried:

  1. Individual inserts: I tried inserting the records one by one using a loop, but this is very slow.
  2. Batch writes: I attempted to use batch writes, but there seems to be a limit on the number of operations I can perform in a single batch.
  3. Firestore SDK for Node.js: I have been using the Firestore SDK for Node.js to manage the inserts.

Current code:

const { Firestore } = require('@google-cloud/firestore');

// Initialize Firestore
const db = new Firestore();

// Data to insert (example)
const data = Array.from({ length: 1000000 }, (_, i) => ({
  field1: `value${i}`,
  field2: `value${i}`,
}));

// Individual insert
async function insertData() {
  for (const item of data) {
    await db.collection('my_collection').add(item);
  }
}

insertData().then(() => {
  console.log('Inserts completed');
}).catch(error => {
  console.error('Error inserting data:', error);
});

Problem:

The above code is extremely slow for such a large number of records. I understand that Firestore has limitations regarding the number of operations per second and per batch, and I would like to know the best way to handle this situation.

Questions:

  • What is the best practice for inserting a large number of records into Firestore?
  • How can I optimize the process to be more efficient?
  • Are there specific limits I need to be aware of and how can I overcome them?
  • Is it possible to use other Google Cloud services, such as Pub/Sub or Dataflow, to solve this problem and how could I integrate them into the bulk insert process?

I appreciate any suggestions or code examples that can help improve the performance of bulk inserts into Firestore.


Solution

  • You've pretty much picked the slowest possible approach here, as you're using await for each individual write operation. So the writes are executed sequentially rather than in parallel.

    To improve the performance, execute the writes in parallel by removing the await you currently have and replacing it with say one await Promise.all(...) for each 100 documents or so. For an example of this, see my answer here: Updating Firestore Documents Using Firebase Cloud Functions is Extremely Slow

    Also see: What is the fastest way to write a lot of documents to Firestore?


    For bulk write operations from server-side processes, also consider using BulkWriter - which is typically much faster than individual write operations. See https://cloud.google.com/nodejs/docs/reference/firestore/latest/firestore/bulkwriter

    The "fastest way" answer I linked above was written before I discovered BulkWriter (or before it event existed? 🤔)