Search code examples
regexregexp-replace

Match multiple words in string and replace it with a single regexp


I have to replace different words in a single string.

The strings, or phrases, are a mx bounce and i have to anonymize it, three single examples:

mxpcfe06.ad.aruba.it bizsmtp SVmpm7g94MdQ9 Connessione rifiutata da 198.61.254.38 / Connection refused from 198.61.254.38
mxdhfe07.ad.aruba.it bizsmtp SUcvm9LnxwUJg Connessione rifiutata da 198.61.254.38 / Connection refused from 198.61.254.38. 
mxdhfe10.ad.aruba.it bizsmtp SSG4mYpjIE14Z Connessione rifiutata da 198.61.254.38 / Connection refused from 198.61.254.38.

The results should be:

ARUBA_HOST bizsmtp ARUBA_HASH Connessione rifiutata da ARUBA_IP_ADDRESS / Connection refused from ARUBA_IP_ADDRESS

Does it possible do it with a single regular expression? If not, can I chain multiple statements over the whole sentence to get the desired result?

Example:

ARUBA_HOST bizsmtp SVmpm7g94MdQ9 Connessione rifiutata da 198.61.254.38 / Connection refused from 198.61.254.38
ARUBA_HOST bizsmtp ARUBA_HASH Connessione rifiutata da 198.61.254.38 / Connection refused from 198.61.254.38
ARUBA_HOST bizsmtp ARUBA_HASH Connessione rifiutata da ARUBA_IP_ADDRESS / Connection refused from ARUBA_IP_ADDRESS

It is important to use the whole sentence because other bounce cases may arrive from other providers and I cannot replace the ip with the wrong placeholder: ARUBA_IP_ADDRESS


Solution

  • With regexp_replace in Oracle, it would look like this:

     regexp_replace(source_value,
          '^(\w+\.)*(\w+)\.\w+( .* da )(\d+\.){3}\d+(.* from )(\d+\.){3}\d+',
          '\2_host\3\2_ipaddress\5\2_ipaddress')
    

    Changing "aruba" (in the example) to uppercased ARUBA is not so easy.