Search code examples
mongodbaggregation-frameworksubdocument

How can I return the minimum values from two subdocuments in a collection using MongoDB's aggregation pipeline?


We have a bunch of products in a database with two types of monetary values attached to each. Each object has a manufacturer, a range and a description, and each object can have a monthly rental amount (for rental agreements), a monthly payment amount (for finance agreements) or both.

An example object would be:

{
    "manufacturer":         "Manufacturer A",
    "range":                "Range A",
    "description":          "Product Description",
    "rentals": {
        "initialRental":    1111.05,
        "monthlyRental":    123.45,
        "termMonths":       24
    },
    "payments": {
        "deposit":          592.56,
        "monthlyPayment":   98.76,
        "finalPayment":     296.28,
        "termMonths":       36
    }
}

There can often be more than one object for a given manufacturer and range.

I'm looking for an aggregation pipeline that will return a list of the lowest monthly rental and the lowest monthly payment for each distinct manufacturer/range pair, but my limited knowledge of how to use the aggregation framework seems to be catching me out.

My intended result, if there were one distinct manufacturers with two distinct ranges, would be the following:

[
    {
        "manufacturer":     "Manufacturer A",
        "range":            "Range A",
        "minimumRental":    123.45,
        "minimumPayment":   98.76
    },
    {
        "manufacturer":     "Manufacturer A",
        "range":            "Range B",
        "minimumRental":    234.56,
        "minimumPayment":   197.53
    }
]

I'm using the following to try and achieve this, but I seem to be tripping up on the grouping and use of $min:

db.products.aggregate(
    [
        {
            "$group": {
                "_id": {
                    "manufacturer": "$manufacturer.name",
                    "range":        "$range.name"
                },
                "rentals": {
                    "$addToSet":    "$rentals.monthlyrental"
                },
                "payments": {
                    "$addToSet":    "$payments.monthlypayment"
                }
            }
        },
        {
            "$group": {
                "_id": {
                    "manufacturer": "$_id.manufacturer",
                    "range":        "$_id.range",
                    "payments":     "$payments"
                },
                "minimumRental": {
                    "$min": "$rentals"
                }
            }
        },
        {
            "$project": {
                "_id": {
                    "manufacturer":     "$_id.manufacturer",
                    "range":            "$_id.range",
                    "minimumRental":    "$minimumRental",
                    "payments":         "$_id.payments"
                }
            }
        },
        {
            "$group": {
                "_id": {
                    "manufacturer":     "$_id.manufacturer",
                    "range":            "$_id.range",
                    "minimumRental":    "$_id.minimumRental"
                },
                "minimumPayment": {
                    "$min":             "$_id.payments"
                }
            }
        },
        {
            "$project": {
                "_id": 0,
                "manufacturer":     "$_id.manufacturer",
                "range":            "$_id.range",
                "minimumRental":    "$_id.minimumRental",
                "minimumPayment":   "$minimumPayment"
            }
        }
    ]
)

It's worth noting, in the case with my test data, that I have deliberately not specified a rental for Range B, as there will be cases where rentals and/or payments are not both specified for a given range.

So, using the query above on my test data gives me the following:

{
    "0" : {
        "minimumPayment" : [ 
            98.76
        ],
        "manufacturer" : "Manufacturer A",
        "range" : "Range A",
        "minimumRental" : [ 
            123.45
        ]
    },
    "1" : {
        "minimumPayment" : [ 
            197.53
        ],
        "manufacturer" : "Manufacturer A",
        "range" : "Range B",
        "minimumRental" : []
    }
}

This is close, but it appears that I'm getting an array instead of a minimum value. I get the impression that what I'm trying to do is possible, but I don't seem to be able to find any resources specific enough to use to find out what I'm doing wrong.

Thanks for reading.


Solution

  • It's a bit complex but there is a little to understand here. First case is simplify and then just find the smallest amount for each

    db.collection.aggregate([
        // Tag things with an A/B value11
        { "$project": {
            "_id": {
                "manufacturer": "$manufacturer.name",
                "range": "$range.name",
            },
            "rental": "$rentals.monthlyRental",
            "payment": "$payments.monthlyPayment"
            "type": { "$literal": [ "R","P" ] }
        }},
    
        // Unwind that "type"
        { "$unwind": "$type" },
    
        // Group conditionally on the type
        { "$group": {
            "_id": {
                "_id": "$_id",
                "type": "$type"
            },
            "value": {
                "$min": {
                    "$cond": [
                        { "$eq": [ "$type", "R" ] },
                        "$rental",
                        "$payment"
                    ]
                }
            }
        }},
        // Sort by type and amount
        { "$sort": { "_id.type": 1, "value": 1 } },
    
        // Group by type only and just take the first after sort
        { "$group": {
            "_id": "$_id.type",
            "manufacturer": { "$first": "$_id._id.manufacturer" },
            "range": { "$first": "$_id._id.range" }
        }}
    ])
    

    And that's basically it, just clean up fields as you need with a $project or deal with it in code.


    Personally though I find that a bit sloppy and with a bit of overhead due to $unwind doing "A/B" values. A better approach would be to run each aggregation in parallel queries, then just merge the result to send to the client.

    I could bang on all day about parallel queries, but the basic example was in an answer I gave recently, so read How to Group By Different Fields which shows the general technique for doing this already.