Design collection, how to de normalize

I have services that are available at many location at different prices. In transitional SQL I'd have the price_location table contain service_id and location_id do a join and group by when I would like to find services in certain region(s) showing highest and lowest price (region will select multiple locations).

Since service and location is very many to very many I'm thinking of the following:

service_location_price = [
  {
    serviceName:'s1';
    ,price:10
    ,location:'location1'
  },{//to keep it simple only serviceName is here but
     // there will be multiple providers for the same
     // serviceName at same location but different price
    serviceName:'s1';
    ,price:12
    ,location:'location1'
  },{
    serviceName:'s1';
    ,price:15
    ,location:'location2'
  }
];

Basically flat file data breaking second normal form (having repeating rows).

Now aggregate and or map reduce should work well getting the services in a certain region showing the minimum and maximum price. Or Showing locations available for certain service(s).

Both service and location have their own collection and the service_location_price collection copies some values for service and location for this query.

Some people have concerns about duplicate data and would like this implemented differently (mongoose populate with match??).

Not sure what my options are here so would appreciate some input from someone with possibly more experience. Are there better ways to get the search to

The service and location will not be updated much but relation between then may change, be added or deleted. But the search for services in regions will be executed very often.

Solution

A populate is a big $in query to resolve the references, and then it swaps out the references in the array for the corresponding documents. It's not so bad if the reference field is indexed, but it's an extra query and it's a crutch of bad schema design because it makes it easier to emulate a relational db when you aren't using a relational db and should be approaching the problem differently. I think it ought to be removed from Mongoose but it's sadly a little late for that :(

I'm not sure how you are modeling regions - you said a region can be multiple locations so I will model a region as an array of location values.

Total number of services in a given region:

db.service_location_price.distinct("serviceName", { "location" : { "$in" : region_array } })

This will give you an array of service names, so the .length will give the number of services.

Min/max price of a service in a region:

db.service_location_price.find({ "location" : { "$in" : region_array }, "serviceName" : "service1" }).sort({ "price" : 1 }).limit(1)
db.service_location_price.find({ "location" : { "$in" : region_array }, "serviceName" : "service1" }).sort({ "price" : -1 }).limit(1)

There's no information about suppliers of a service in the sample docs, so I don't know how to find the number of suppliers of a service in a region. Maybe you want to include a supplier field in the docs?