Search code examples
mongodbaggregation-framework

How to find ids for products that are not on sale but exist in the product details collection using aggregation pipeline


I am joining a products collection on a productDetail collection where the product _id is the _id in the productDetail collection. I want to be able to find the products which are not on sale but exist in the productDetail table to include in my output as inProductDetailCollection for my reporting. I took a stab at it but it does not work. It always return true even when some _ids are not in the productDetail table. Here is my attempt. How can I get it to set the flag inProductDetailCollection to be false if missing or true if present


     db.getCollection("products").aggregate([
      {
        "$match": {
          "onSale": {
            "$exists": false
          }
        }
      },
      {
        "$lookup": {
          "from": "productDetail",
          "localField": "_id",
          "foreignField": "_id",
          "as": "productDetail"
        }
      },
      {
        "$match": {
          "$expr": {
            "$eq": [
              "$productDetail",
              []
            ]
          }
        }
      },
     
      {
        "$unwind": {
          "path": "$productDetail",
          "preserveNullAndEmptyArrays": true
        }
      },
      {
  
        "$addFields": {
          "onSale": false
        }
      },
      {
        "$group": {
          "_id": {
            "productId": "$_id",
            "onSale": "$onSale"
          }
        }
      },
      {
        "$project": {
          "productId": "$_id.productId",
          "onSale": "$_id.onSale",
          "inProductDetailCollection":
           {
                 $cond: { if: { $gt: [ "_id", 0 ] }, then: true, else: false }
          },  
          "_id": 0
        }
      }
    ]
  )

Here is a sample of the data. One thing to note is products in the productDetail may not have _ids present in the productDetail table since this table gets uploaded by a nightly batch job. Also, products which are not on sale will have no 'onSale': 'Y' attribute. This is what the data looks like:


    //products collection, some fields omitted for brevity

    [
      {
       "_id": 123345,
       "name": "NutraFast",
       "description": "Supplement",
       "price": 35.99
     },
     { 
       "_id": 13443,
       "name": "BerryBlast",
       "description": "Athletes Sports Drink",
       "price": 12.99
      },
      {
       "_id": 15644
       "name": "MagnoPower-11",
       "description": "Supplement",
       "price": 45.99
      }
     ,{
       "_id": 17011
       "name": "Zinc566",
       "description": "Supplement",
       "price": 25.99
     },
     {
       "_id": 15011
       "name": "VitaMax",
       "description": "Supplement",
       "price": 15.99
      },
      {
       "_id": 15311
        "name": "VitaMax",
         "description": "Supplement",
         "price": 15.99
       },
      { 
        "_id": 15316
        "name": "Chlorphyl Cleanse",
        "description": "Supplement",
        "price": 55.99
      }
     ]

Here is the productDetail collection sample


      //productDetail collection
      [
        {
         "_id": 123345,
         "discount": 0.3,
         "amount-per-unit": 50,
         "inStock": "Y"
       },
      {
        "_id": 13443,
        "discount": 0.5,
        "onSale": "Y",
        "amount-per-unit": 60,
        "inStock": "Y"
        "onSale":"Y"
      },
      {
        "_id" : 15644,
        "discount": 0.5,
        "onSale": "Y",
        "amount-per-unit": 60,
        "inStock": "Y"
      },

      {
        "_id" : 15011,
        "discount": 0.5,
        "amount-per-unit": 60,
        "inStock": "Y"
      },

      {
        "_id" : 17011,
        "discount": 0.5,
        "amount-per-unit": 60,
        "inStock": "Y"
      } 

    ]




Solution

  • It's very difficult to understand what you actually need. However, I have taken the liberty of correcting some invalid object structure in your sample documents and below are two possible aggregations that should hopefully get you the output you're looking for:

    Example 1

    This will return only the products documents that have:

    • a matching productDetail
    • with a field named productDetail that includes the $lookup document
    • with a field named inProductDetailCollectionButNotOnSale
    • which will be true only if the product is not on sale
    • or false otherwise
    db.getCollection("products").aggregate([
      {
        $lookup: {
          from: "productDetail",
          localField: "_id",
          foreignField: "_id",
          as: "productDetail"
        }
      },
      {
        $unwind: {
          path: "$productDetail"
        }
      },
      {
        $set: {
          inProductDetailCollectionButNotOnSale: {
            $cond: [
              {
                $eq: [
                  "$productDetail.onSale",
                  "Y"
                ]
              },
              false,
              true
            ]
          }
        }
      }
    ])
    

    See HERE for a working example.


    Example 2

    This will return all products documents and includes:

    • a field named productDetail that includes the $lookup document (if any)
    • with a field named inProductDetailCollectionButNotOnSale
    • which will be true only if the product has a matching productDetail and is not on sale
    • or false otherwise
    db.getCollection("products").aggregate([
      {
        $lookup: {
          from: "productDetail",
          localField: "_id",
          foreignField: "_id",
          as: "productDetail"
        }
      },
      {
        $unwind: {
          path: "$productDetail",
          preserveNullAndEmptyArrays: true
        }
      },
      {
        $set: {
          inProductDetailCollectionButNotOnSale: {
            $cond: [
              {
                $and: [
                  {
                    $gt: [
                      "$productDetail",
                      0
                    ]
                  },
                  {
                    $ne: [
                      "$productDetail.onSale",
                      "Y"
                    ]
                  }
                ]
              },
              true,
              false
            ]
          }
        }
      }
    ])
    

    See HERE for a working example.

    Note: for clarity and in case anyone else wonders why this part works:

    $gt: [
       "$productDetail",
       0
    ]
    

    Mongodb has a Comparison/Sort Order which lists the BSON types in order from lowest to highest. What I have done is compare the $productDetail field against a Number. Since Object is 4th on the list and Number is 3rd on the list it basically means if $productDetail exists in the document then it will be on level 4 which is higher ($gt) than a the number zero which is on level 3 on the list.

    It returns true because it essentially translates to:

    $gt: [
       4, // 4 is greater than 3
       3
    ]
    

    If $productDetail doesn't exist then it will return false because a null is on level 2 and Number is higher on level 3 so it translates to:

    $gt: [
       2, // 2 is NOT greater than 3
       3
    ]