Search code examples
mongodbcsvexport-to-csvmongoexport

Mongoexport to multiple csv files


I have a large mongoDB collection. I want to export this collection to CSV so I can then import it in a statistics package to do data analysis.

The collection has about 15 GB of documents in it. I would like to split the collection into ~100 equally sized CSV files. Is there any way to achieve this using mongoexport? I could also query the whole collection in pymongo, split it and write to csv files manually, but I guess this would be slower and would require more coding.

Thank you for input.


Solution

  • You can do it using --skip & --limit options.

    For example, if you that your collection holds 1,000 document you can do it using a script loop (pseudo code):

    loops = 100
    count = db.collection.count()
    batch_size = count / loops
    
    for (i = 0; i < loops; i++) {
        mongoexport --skip (batch_size * i) --limit batch_size --out export${i}.json ...
    } 
    

    Taking into account that your documents are roughly equal in size.

    Note however, that large skips are slow.

    Lower bound iterations will be faster than upper bound iterations.