Search code examples
mongodbaggregation-frameworkspring-data-mongodb

Any functionality in a MongoDB Aggregation like a simple "regex replace"?


I'm trying to create a step in a MongoDB aggregation pipeline which will replace the value of a full Social Security Number (like "123-45-6789") with a masked value, preserving the last four digits (like "XXX-XX-6789").

I see $replaceAll and $replaceOne but it looks like those only accept string literals, not regex expressions. I see there is a regex find that looks like it can very clumsily fit into a replace stage:

"maskedSsn" : {
    "$replaceOne" : { 
        "input" : "$ssn",
        "find" : { 
            "$regexFind" : { 
                "input" : "$ssn", 
                "regex" : "\\d{3}-\\d{2}\\-"
            }
        },
        "replacement" : "XXX-XX-"
    }
}

But with this I get the error "$replaceOne requires that 'find' be a string, found: {match: \"000-00-\", idx: 0, captures: []} (the SSN is mocked, random data here). I'm not sure how to just get the "match" out of the resultant object as part of this step. I could add this regex match as a different field, then get the match in this stage of my aggregation for a simple string replacement, but I would prefer to keep all the logic in one step if possible.

It's hard for me to believe that this isn't possible in one step, with something like a regex replace (which is a feature of every language I've used). Is there a functionality that I'm missing?


Solution

  • There is no generalized regexReplace operator in MongoDB.

    You could kind of get where you want using $regexFind, like

    {$regexFind: {
       input:"$ssn",
       regex:"^\\d{3}-\\d{2}-(\\d{4})$"
    }}
    

    That would return an object similar to:

    { 
      match: "123-45-6789",
      idx: 0
      captures: ["6789"]
    }
    

    You could use $getField, $arrayElemAt and $concat to extract the captures part, then do what you need with that.

    db.collection.aggregate([
      {$addFields: {
          maskedssn: {
            $concat: [
              "XXX-XX-",
              {$arrayElemAt: [
                  {$getField: {
                      field: "captures",
                      input: {$regexFind: {
                          input: "$ssn",
                          regex: "^\\d{3}-\\d{2}-(\\d{4})$"
                        }}
                  }},
                  0
              ]}
            ]
          }
     }}
    ])
    

    Playground