Search code examples
mongodbgroup-bydistinctdoctrine-odm

Mongodb limit result number with something like groupby


I have a MongoDB collection with multiple hierarchy, for the example I will use a collection with countries, that contain cities, each document is for a particular city and contain the population value for the country and the city (country_pop and city_pop in the example) this is very simplified, I have in reality 6 hierarchies and a big amount of data.

[
  {
    "country": "France",
    "city": "Paris",
    "country_pop": 63000000,
    "city_pop": 2200000,
    "year": 2015
  },
  {
    "country": "France",
    "city": "Marseille",
    "country_pop": 63000000,
    "city_pop": 850726,
    "year": 2015
  },
  {
    "country": "France",
    "city": "Toulouse",
    "country_pop": 63000000,
    "city_pop": 441802,
    "year": 2015
  },
  {
    "country": "France",
    "city": "Paris",
    "country_pop": 63500000,
    "city_pop": 2350000,
    "year": 2016
  },
  {
    "country": "France",
    "city": "Marseille",
    "country_pop": 63500000,
    "city_pop": 880726,
    "year": 2016
  },
  {
    "country": "France",
    "city": "Toulouse",
    "country_pop": 63500000,
    "city_pop": 445802,
    "year": 2016
  }
]

I am currently using doctrine mongo odm to Hydrate my documents into Php Object but it is not a requirement. What I want to achieve is get in my php script values to display something like that :

  • France :
    • 2015: 63000000
    • 2016: 63500000

Currently, I get all documents that match {"country": "France"} , so in this example I will get the 6 entries. But in reality, with a big amount of data, It's kinda bad to get 6 entries where I could get only two, one of year 2015 and one of year 2016 (because the value of country_pop will be the same in all entries that match {"year": "2016", "country": "France"}

During my test my php script use something like 100mo in order to generate a timeline of my values over the years and it is not acceptable. I agree that my documents structure is not very good but I have no control over it.

Is there any solution to do something like a select country_pop ... groupBy("country", "year") in order to get only the minimum results needed ?

I found the group query in doctrine mongodb odm documentation: http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/query-builder-api.html#group-queries but there is no real explanations.

Also the mongo documentation for the "group" method https://docs.mongodb.com/v3.2/reference/method/db.collection.group/ seems to be used to do some aggregation like sum or count and it is not what I am looking for.


Solution

  • Try this for starters and let me know if you want any additional data:

    db.collectionName.aggregate([
        { 
            $group: { 
                "_id": { 
                    "Country": "$country", 
                    "Year": "$year", 
                    "CountryPop": "$country_pop" 
                } 
            } 
        }
    ])
    

    This will group your results by country, year, and country population and result in the following for your data set:

    { "_id" : { "Country" : "France", "Year" : 2016, "CountryPop" : 63500000 } }
    { "_id" : { "Country" : "France", "Year" : 2015, "CountryPop" : 63000000 } }