Search code examples
regexlookbehind

Extract a substring from value of key-value pair using regex


I have a string in log and I want to mask values based on regex.

For example:

"email":"testEmail@test.com", "phone":"1111111111", "text":"sample text may contain email testEmail@test.com as well"

The regex should mask

  1. email value - both inside the string after "email" and "text"
  2. phone number

Desired output:

"email":"*****", "phone":"*****", "text":"sample text may contain email ***** as well"

What I have been able to do is to mask email and phone individually but not the email id present inside the string after "text".

Regex developed so far:

(?<=\"(?:email|phone)\"[:])(\")([^\"]*)(\")

https://regex101.com/r/UvDIjI/2/


Solution

  • As you are not matching an email address in the first part by matching not a double quote, you could match the email address in the text by also not matching a double quote.

    One way to do this could be to get the matches using lookarounds and an alternation. Then replace the matches with *****

    Note that you don't have to escape the double quote and the colon could be written without using the character class.

    (?<="(?:phone|email)":")[^"]+(?=")|[^@"\s]+@[^@"\s]+
    

    Explanation

    • (?<="(?:phone|email)":") Assert what is on the left is either "phone":" or "email":"
    • [^"]+(?=") Match not a double quote and make sure that there is one at the end
    • | Or
    • [^@"\s]+@[^@"\s]+ Match an email like pattern by making use of a negated character class matching not a double quote or @

    See the regex demo