This is my first time making an account on Stackoverflow so I apologise if what I am asking is really straightforward.
What I want to do: I have a 14 million documents database of twitter data I wish to analyse. I am trying to query only those that are in a specific language and export that query to a smaller collection so that I can actually perform my analysis on it.
My issue: I can't seem to run a full query without the MongoDB Compass timing out or running indefinitely - I don't know how to make my database smaller and I can't run my analysis on it without my RAM being overused and my computer crashing.
What I have tried:
Please help me I am genuinely floored all my analysis skills are useless because I can't seem to get to the data because of the sheer size :(
If you have any other tips e.g. don't use MongoDB, use R or Hadoop for windows or smth, please let me know, at this point I'm willing to teach myself anything I can if I can get a grip on this dataset!
Thank you!
Add an index to the fields that you want to query on, and increase the memory etc. in your cluster. To create index fields on your collections use the following shell commands once:
db.collection.createIndex(
{
"language": 1
},
{
unique: false,
}
)
db.collection.createIndex(
{
"user.location": 1
},
{
unique: false,
}
)
You don't need to change your query to use the indexes, MonogDB will sort that out for you.