I have a large mongoDB collection. I want to export this collection to CSV so I can then import it in a statistics package to do data analysis.
The collection has about 15 GB of documents in it. I would like to split the collection into ~100 equally sized CSV files. Is there any way to achieve this using mongoexport? I could also query the whole collection in pymongo, split it and write to csv files manually, but I guess this would be slower and would require more coding.
Thank you for input.
You can do it using --skip
& --limit
options.
For example, if you that your collection holds 1,000 document you can do it using a script loop (pseudo code):
loops = 100
count = db.collection.count()
batch_size = count / loops
for (i = 0; i < loops; i++) {
mongoexport --skip (batch_size * i) --limit batch_size --out export${i}.json ...
}
Taking into account that your documents are roughly equal in size.
Note however, that large skips are slow.
Lower bound iterations will be faster than upper bound iterations.