How can I leverage Firebase to develop a mixer algorithm similar to Twitter, which retrieves and ranks discussions from Firestore based on weight
and created_at
parameters?
I have a discussion collection with the following structure:
interface Discussion {
weight: number;
created_at: ServerTimeStamp;
}
In Firestore, ordering data by a single field poses a limitation. For example, if we order discussions solely by weight
, new posts will never have the opportunity to rise up in the ranking.
If I attempt to order discussions separately by weight
and created_at
, how can I handle deduplication
effectively?
It's important to consider that the discussion documents can vary from 0 to 1 million
. Therefore, I prefer a solution that avoids loading all the documents on the client side. Additionally, any changes made must be reactive and utilize the onSnapshot
method for real-time updates.
interface Discussion {
weight: number;
created_at: ServerTimeStamp;
}
async function queryDiscussionFromFireStore () {
const col_ref = collection("discussion")
// query top discussions
const topPost_unSub = onSnapShot(query(col_ref, orderby("weight"),
(snapShot) => {
setState(snapShort.doc.map (d => d.data() as Array<Discussion>)
})
// query recent discussions
const recentPost_unSub = onSnapShot(query(col_ref, orderby("created_at"),
(snapShot) => {
setState(snapShort.doc.map (d => d.data() as Array<Discussion>)
})
return () => {
recentPost_unSub()
topPost_unSub()
};
}
queryDiscussionFromFireStore
is working fine but i'm not able to figure out how to handle duplicate data.
let suppose we have following data:
[
{
weight: 5,
created_at: today_date
},
{
weight: 3,
created_at: today_date
},
]
In this case both snapShot
will response with same data.
In the provided code example, the queryDiscussionFromFirestore
function retrieves discussions from Firestore by ordering them based on two criteria: weight and created_at. The function uses the onSnapshot
method to listen for real-time updates on the queried discussions.
However, there is a concern regarding duplicate data. In the given scenario, if multiple discussions have the same created_at
timestamp, both the "top discussions" query (ordered by weight) and the "recent discussions" query (ordered by creation time) may return the same data.
For instance, considering the following example data:
[
{
weight: 5,
created_at: today_date
},
{
weight: 3,
created_at: today_date
},
]
In this case, both onSnapshot
callbacks for the "top discussions" and "recent discussions" queries will receive the same data, which results in duplicate entries being processed.
From the Firestore documentation on its query limitations:
In a compound query, range (<, <=, >, >=) and not equals (!=, not-in) comparisons must all filter on the same field.
So each query can only have range filters on a single field, and there is no way to order/filter top results on multiple fields in a single query. You will have to perform multiple queries and deduplicate the results in your application code.
That also means that there is no way to prevent the extra reads. Theoretically, you could find a way to merge the created_at
and weight
into a single value/property that you can filter on to meet your requirements, but the only real example of something like that that I know of are geohashes (which combine the lat/lon values of a point into a single string value that you can filter on to find documents in a region), and I personally don't see an equivalent here.