I'm trying to create a step in a MongoDB aggregation pipeline which will replace the value of a full Social Security Number (like "123-45-6789"
) with a masked value, preserving the last four digits (like "XXX-XX-6789"
).
I see $replaceAll
and $replaceOne
but it looks like those only accept string literals, not regex expressions. I see there is a regex find that looks like it can very clumsily fit into a replace stage:
"maskedSsn" : {
"$replaceOne" : {
"input" : "$ssn",
"find" : {
"$regexFind" : {
"input" : "$ssn",
"regex" : "\\d{3}-\\d{2}\\-"
}
},
"replacement" : "XXX-XX-"
}
}
But with this I get the error "$replaceOne requires that 'find' be a string, found: {match: \"000-00-\", idx: 0, captures: []}
(the SSN is mocked, random data here). I'm not sure how to just get the "match"
out of the resultant object as part of this step. I could add this regex match as a different field, then get the match in this stage of my aggregation for a simple string replacement, but I would prefer to keep all the logic in one step if possible.
It's hard for me to believe that this isn't possible in one step, with something like a regex replace (which is a feature of every language I've used). Is there a functionality that I'm missing?
There is no generalized regexReplace operator in MongoDB.
You could kind of get where you want using $regexFind, like
{$regexFind: {
input:"$ssn",
regex:"^\\d{3}-\\d{2}-(\\d{4})$"
}}
That would return an object similar to:
{
match: "123-45-6789",
idx: 0
captures: ["6789"]
}
You could use $getField, $arrayElemAt and $concat to extract the captures part, then do what you need with that.
db.collection.aggregate([
{$addFields: {
maskedssn: {
$concat: [
"XXX-XX-",
{$arrayElemAt: [
{$getField: {
field: "captures",
input: {$regexFind: {
input: "$ssn",
regex: "^\\d{3}-\\d{2}-(\\d{4})$"
}}
}},
0
]}
]
}
}}
])