Search code examples
mongodbperformancemongodb-queryrequirements

Need answers regarding MongoDB set up on local machine for lot of data


I have data more than 200gb and it is in JSON and CSV format and more than 300millian rows (documents).

I want to store it in MongoDB Database. I want to know that what requirement of the machine to handle this process like store and retrieval and manipulation of data. Also what time it would take to search data from whole data?


Solution

  • IMO, technical choice depends on your data structure and how to use these data. Below answer assumed you store all the data into a single collection in a single mongodb instance in a single machine.


    I did an experiment in the past to test the performance of mongodb with large data. I will share the result to you.

    Data volume

    • Amount of data: 1 Billion
    • document format: 4 fields(ObjectID + Int + String + Date) ~ 200Bytes/document
    • All documents are stored into one collection

    Hardware

    • CPU: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10Ghz(4 cores)
    • RAM: 32GB
    • Disk: 2TB LSI MRSASRoMB-8i SCSI Disk Device

    Software

    • OS: Redhat Sever6.4-X86-64 with Ext4
    • Mongodb: 3.2 x64 (engine: wireTiger, cacheSize set to 28GB)

    Test result

    Insert performance

    Before index creation: No additional index(only default _id index) After index creation: Add one more index on the string field

    ╔══════════════════════╦═══════════════════════╦══════════════════════╗
    ║                      ║ Before index creation ║ After index creation ║
    ╠══════════════════════╬═══════════════════════╬══════════════════════╣
    ║ Single thread insert ║ 656/s - 746/s         ║ 534/s - 712/s        ║
    ║ 10 Threads insert    ║ 3817/s - 3964/s       ║ 3306/s - 3389/s      ║
    ╚══════════════════════╩═══════════════════════╩══════════════════════╝
    

    Query performance

    Query by the string field.

    ╔═══════════════════╦═══════════════════════╦══════════════════════╗
    ║                   ║ Before index creation ║ After index creation ║
    ╠═══════════════════╬═══════════════════════╬══════════════════════╣
    ║ Return 1 document ║ 1268904 ms            ║ 15 ms                ║
    ╚═══════════════════╩═══════════════════════╩══════════════════════╝
    

    Build index

    If build index on string field after already 1 Billion documents in the collection, it takes ~3 hours to finish.

    RAM consumption

    In the insert test, when all the cahce(28GB) runs out, the insert speed will drop.

    Conclusion

    1. No big different between before index & after index in insert performance.(In my condition, not sure when created a lot of indexes)

    2. Mongodb tend to use as much as RAM it can, if you have large hot data, you'd better provide large RAM to it.

    3. If built good index, then the query performance is good at Billion data level.

    4. Build index on large data will cost you a lot of time.