Search code examples
firebasegoogle-cloud-firestorenosqlnosql-aggregation

Firebase / NoSQL - How to aggregate data for statistics


I'm creating my first ever project with Firebase, and I come to the point when I need some statistics based on user input. I know Firebase (or NoSQL databases in general) are not ideal for statistics but they work for me in any other cases so I would like to give it a try.

What I have:

I work on the application where people can invite a friend to work for their company, so I have a collection of "referrals" where ID of each referral is basically UserID of a user to who the referral belongs, and then there is a subcollection with name "items" where data are stored.

enter image description here

How my data looks like: Each item have these data:

  • applicant
  • appliedDate
  • position(part of position is positionId & department on which this position is coming from)
  • status

What I wanted is to let user to make statistics based on:

  • date range
  • status
  • department

What I was thinking about:

It's probably not the best idea to let firebase iterate over all referrals once users make requests as it may get really expensive on firebase. What I was thinking of is using cloudfunctions to calculate statistics always when something change e.g. when a new applicant applies I will increase the counter by one and the same for a counter to a specific department. However I feel like this make work for total numbers or for predefined queries e.g. "LAST MONTH" but once I will not know what dates user will select it start to get tricky.

Any idea how can I design something like this? Thanks a lot!


Solution

  • What you're considering is the idiomatic approach to calculate aggregated in Firestore, and most NoSQL databases. If you follow this pattern, Firestore is quite well suited to storing statistics.

    It's ad-hoc statistic, like the unknown data range, that are trickier. Usually this comes down to storing the right values to allow you to get rid of the need to read an unknown number of documents to calculate a value.

    For example, if you store counters for the statistics per month, week, day and hour, you can satisfy a wide range of date ranges with a limited number of read operations. You may need to read multiple documents, but the number of documents to read depends on the range, and not on the total number of documents in the database.

    Of course, for the most flexible ad-hoc querying, you may still want to consider another solution, such as BigQuery, which was made precisely for this use-case.