Search code examples
mongodbaggregation-frameworknosql-aggregation

How do you use MongoDB $lookup to return a substring, then check things in another collection with that substring


I am trying to use the MongoDB Aggregation pipeline to pull items from one collection, check a string (a url), extract a substring from that, then check in a second collection for a matching document to said substring off one of the second collection document fields.

collection1 document:

{
    _id: ObjectId('xxxxxxxxxxxxxxxxxxx'),
    url: 'https://example.com/WNE8UH'
}

edit: The productId is not always located at the end of the url string. Sometimes it's right in the middle, which forces me to extract it before any comparison step.

collection2 document:

{
    _id: ObjectId('xxxxxxxxxxxxxxxxxxx'),
    productId: 'WNE8UH'
}

See the url in collection1 contains WNE8UH which is productId in collection2?

How do you use aggregation to return the substring of url, then use $lookup to locate the document in collection2 productId field?

Here is my current code:

db.collection1.aggregation([
    { $match: { url: RegExp( 'example.com', 'i' ) } },
    { $lookup: {
        from: 'collection2',
        
        let: {
            productId: { 
                $regexFind: { 
                    input: "$url", 
                    regex: '(?<=com\/)\w{6}'
             }
           }
        },
        pipeline: [
        {
            $match: { productId: '$$productId'
                }
        }
        ],
        as: 'matching_doc'
        
    }}
])

Result

{
    _id: ObjectId('xxxxxxxxxxxxxxxxxxx'),
    url: 'https://example.com/WNE8UH',
    matching_doc: []
}

Getting empty array. I need to get the matching doc.

What am I doing wrong?


Solution

  • Query

    • lookup if url contains the productId in the end of the URL

    *if this regex is not ok for your case you can use any other
    *if you want to use an index to match the url you should create the index and also use regex with ^ like ^https://example.com/ (i mean before the lookup like in your query)

    Playmongo

    coll1.aggregate(
    [{"$lookup": 
       {"from": "coll2",
        "let": {"url": "$url"},
        "pipeline": 
         [{"$match": 
             {"$expr": 
               {"$regexMatch": 
                 {"input": "$$url",
                  "regex": {"$concat": ["$productId", {"$literal": "$"}]}}}}}],
        "as": "matching_doc"}}])
    

    Edit

    In case that the productId is not in the end but it is always after the domain name here example.com you can use something like

    Playmongo

    coll1.aggregate(
    [{"$lookup":
       {"from":"coll2",
        "pipeline":
         [{"$match":
             {"$expr":
               {"$regexMatch":
                 {"input":"$$url",
                  "regex":
                   {"$concat":["^https://example.com/", "$productId", "*"]}}}}}],
        "as":"matching_doc",
        "let":{"url":"$url"}}}])
    

    If it can be anywhere inside the URL you can use something like the bellow

    coll1.aggregate(
    [{"$lookup": 
       {"from": "coll2",
        "pipeline": 
         [{"$match": 
             {"$expr": 
               {"$regexMatch": {"input": "$$url", "regex": "$productId"}}}}],
        "as": "matching_doc",
        "let": {"url": "$url"}}}])