We are scraping a huge products website.
So, we will get and persist so many products, and almost each product has a different set of features/details.
Naturally, we consider using a NoSQL database (MongoDB) for this job. We will make a collection "products", and a document for each product where each key/value pair map to detail_name/detail_description of the product.
Since products are quite different, we have almost no idea what are the product details/features. In other words, we have no knowledge of the available keys.
According to this link MongoDB case insensitive key search, It is a "gap" for MongoDB (that we do not have some idea of the available keys).
Is this true? If yes, what are the alternatives?
Your key problem isn't that much of an issue for MongoDB provided you can live with a slightly different schema and big indexes :
Normally you would do something like :
{
productId :..
details : {
detailName1 : detailValue1,
detailName2 : detailValue2;
}
}
But if you do this you can index the details field :
{
productId :..
details : [
{field : detailName1, value : detailValue1},
{field : detailName2, value : detailValue2}
]
}
Do note that this will result in a very big index. Not necessarily a problem but something to be aware of. The index will then be {details.field:1, details.value:1}
(or just {details:1}
if you're not adding additional fields per detail).