Search code examples
mongodbshardinggridfsreplicaset

Where should I use sharding in mongodb or run multiple instance of mongodb?


Issue

I have at least 10 text files(CSV), each reaches to 5GB in size. There is no issue when I import the first text file. But when I start importing the second text file it shows the Maximum Size Limit (16MB).

My primary purpose for using the database is for searching the customers from the database using customer_id index.

Given Below is the details of One CSV File.

Collection Name|Documents|Avg.Document Size|Total Document Size|Num.Indexes| Total Index Size|Properties

Customers|8,874,412|1.8 KB|15.7 GB|3|262.0 MB

To overcome this MongoDB community were recommending GridFS, but the problem with GridFS is that the data is stored in bytes and its not possible to query for a specific index in the textfile.

I don't know if its possible to query for a specific index in a textfile when using GridFS. If some one knows any help is appreciated.

Then the other solution I thought about was creating multiple instance of MonogDB running in different ports to solve the issue. Is this method feasible?

  1. But lot of the tutorial on multiple instance shows how to cerate a replica set. There by storing the same data in the PRIMARY and the SECONDARY.
  2. The SECONDARY instances don't allow to write and only allows to read data.

Is it possible to create multiple instance of MongoDB without creating replica set and with write and read operations on them? If Yes How? Can this method overcome the 16MB limit.

Second Solution I thought about was creating shards of the collections or simply sharding. Can this method overcome the 16MB limit. If yes any help regarding this.

Of the two solutions which is more efficient for searching for data (in terms of speed). As I mentioned earlier I just want to search of customers from this database.

The Error


Solution

  • The error message shows exactly where the problem is: entry #8437: line 13530, column 627

    Have a look at the file and correct it in the file.

    The error extraneous " in field ... is quite clear. In your CSV file you have an opening quote " but it is not closed, i.e. the rest of entire file is considered as one single field.