Search code examples
javascriptnode.jsfirebasegoogle-cloud-platformgoogle-cloud-firestore

What are the implications of using the select() query on a large Firestore doc versus splitting the data between multiple docs?


I'm looking to store a large amount of binned time series data in Firestore. Each key will be the timestamp that marks the start of that binning period (e.g. "1716505200").

I want to be able to efficiently retrieve the data while minimizing the number of document reads I perform. I have seen that you can use select() in the Node.js SDK to apply a field mask. Since I know every key in the database (because the binning process is always on a regular interval) I figured I could use select() to grab a set of keys in a specific time frame.

This leads me to my question - what are the downsides of putting near the maximum amount of data points inside a single document (1MiB) and using select() to grab certain sections of it? As far as I understand, this will minimize the number of reads compared to splitting the data between smaller documents, and not incur any further costs. The potential downside I see is a hit on the performance. Is there any information available on the efficiency of using select() versus grabbing multiple documents in full? Also, am I incorrect about select() not incurring further costs?

I'm currently following an approach that has each document store 24 hours of data. This is working fine for the time being but I am conscious of the number of reads I could save if I took the "mono-document" approach.


Solution

  • Downsides of putting near the maximum amount of data points inside a single document (1MiB) and using select() to grab certain sections of it?

    There are no downsides to having a document that weighs almost 1 MiB, other than the bandwidth you consume for reading the document. Since you're planning to use select() and get only the fields you are interested in, there are no downsides at all.

    As far as I understand, this will minimize the number of reads compared to splitting the data between smaller documents, and not incur any further costs.

    Yes, that is correct. You'll only have to pay a single read operation for reading the document and the bandwidth you consume for reading those fields.

    The potential downside I see is a hit on the performance.

    I don't see performance issues in your approach.

    Also, am I incorrect about select not incurring further costs?

    Yes, you're incorrect.

    If you want to monitor the size of your Firestore documents, you can also use a Firebase Extension called Firestore Document Size which:

    Creates a key-value pair in a specified Realtime Database location, each time a new document is added/updated in a specified Firestore collection.