I have to find data in between different date ranges column in a sharded database having total of around 800 million documents. I am using this query:
cursordata=event.aggregate([{"$match":{}},{"$unwind":},{"$project":{}}])
However, when I change it to a pandas dataframe
df=pd.DataFrame(cursordata)
It is taking for ever and not working at all, it just got stuck.
I have 2 choices:
Please suggest how to proceed.
Could we have a sample of documents? I think you should look for an index matching the fields you're querying.
As a reminder, try to keep in mind the Equality, Sort, Range rule in MongoDB indexing.
Besides, since you're in a sharded cluster you might want to have your sharding key in you query, otherwise the mongos will query all the shards (more info here)