Search code examples
mongodbscreen-scrapingnosql

Is it ok to use MongoDB when we have no idea about the availabe keys?


We are scraping a huge products website.

So, we will get and persist so many products, and almost each product has a different set of features/details.

Naturally, we consider using a NoSQL database (MongoDB) for this job. We will make a collection "products", and a document for each product where each key/value pair map to detail_name/detail_description of the product.

Since products are quite different, we have almost no idea what are the product details/features. In other words, we have no knowledge of the available keys.

According to this link MongoDB case insensitive key search, It is a "gap" for MongoDB (that we do not have some idea of the available keys).

Is this true? If yes, what are the alternatives?


Solution

  • Your key problem isn't that much of an issue for MongoDB provided you can live with a slightly different schema and big indexes :

    Normally you would do something like :

    {
        productId :..
        details : {
            detailName1 : detailValue1,
            detailName2 : detailValue2;
        }
    }
    

    But if you do this you can index the details field :

    {
        productId :..
        details : [
            {field : detailName1, value : detailValue1},
            {field : detailName2, value : detailValue2}
        ]
    }
    

    Do note that this will result in a very big index. Not necessarily a problem but something to be aware of. The index will then be {details.field:1, details.value:1} (or just {details:1} if you're not adding additional fields per detail).