python pandas mongodb mongodb-query sharding

How to use mongodb query operation on a very large database (have 3 shards of around 260-300 million in each)

I have to find data in between different date ranges column in a sharded database having total of around 800 million documents. I am using this query:

cursordata=event.aggregate([{"$match":{}},{"$unwind":},{"$project":{}}])

However, when I change it to a pandas dataframe

df=pd.DataFrame(cursordata)

It is taking for ever and not working at all, it just got stuck.

I have 2 choices:

Either keep doing query for different conditions directly from mongodb or
After changing to data to dataframe, perform operation for different conditions

Please suggest how to proceed.

Solution

Could we have a sample of documents? I think you should look for an index matching the fields you're querying.

As a reminder, try to keep in mind the Equality, Sort, Range rule in MongoDB indexing.
Besides, since you're in a sharded cluster you might want to have your sharding key in you query, otherwise the mongos will query all the shards (more info here)