mongodb performance mongodb-query requirements

Need answers regarding MongoDB set up on local machine for lot of data

I have data more than 200gb and it is in JSON and CSV format and more than 300millian rows (documents).

I want to store it in MongoDB Database. I want to know that what requirement of the machine to handle this process like store and retrieval and manipulation of data. Also what time it would take to search data from whole data?

Solution

IMO, technical choice depends on your data structure and how to use these data. Below answer assumed you store all the data into a single collection in a single mongodb instance in a single machine.

I did an experiment in the past to test the performance of mongodb with large data. I will share the result to you.

Data volume

Amount of data: 1 Billion
document format: 4 fields(ObjectID + Int + String + Date) ~ 200Bytes/document
All documents are stored into one collection

Hardware

CPU: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10Ghz(4 cores)
RAM: 32GB
Disk: 2TB LSI MRSASRoMB-8i SCSI Disk Device

Software

OS: Redhat Sever6.4-X86-64 with Ext4
Mongodb: 3.2 x64 (engine: wireTiger, cacheSize set to 28GB)

Test result

Insert performance

Before index creation: No additional index(only default _id index) After index creation: Add one more index on the string field

╔══════════════════════╦═══════════════════════╦══════════════════════╗
║                      ║ Before index creation ║ After index creation ║
╠══════════════════════╬═══════════════════════╬══════════════════════╣
║ Single thread insert ║ 656/s - 746/s         ║ 534/s - 712/s        ║
║ 10 Threads insert    ║ 3817/s - 3964/s       ║ 3306/s - 3389/s      ║
╚══════════════════════╩═══════════════════════╩══════════════════════╝

Query performance

Query by the string field.

╔═══════════════════╦═══════════════════════╦══════════════════════╗
║                   ║ Before index creation ║ After index creation ║
╠═══════════════════╬═══════════════════════╬══════════════════════╣
║ Return 1 document ║ 1268904 ms            ║ 15 ms                ║
╚═══════════════════╩═══════════════════════╩══════════════════════╝

Build index

If build index on string field after already 1 Billion documents in the collection, it takes ~3 hours to finish.

RAM consumption

In the insert test, when all the cahce(28GB) runs out, the insert speed will drop.

Conclusion

No big different between before index & after index in insert performance.(In my condition, not sure when created a lot of indexes)
Mongodb tend to use as much as RAM it can, if you have large hot data, you'd better provide large RAM to it.
If built good index, then the query performance is good at Billion data level.
Build index on large data will cost you a lot of time.