Search code examples
javaregexreplaceall

How to ensure replaceAll will replace a whole word and not a subString


I have an input of dictionary. The dictionary is iterated over to replace the key from dictionary in the text. But replaceAll function replaces the subString as well.

How to ensure that it will match the whole word (as a whole and not as a subString)

String text= "Synthesis of 1-(2,6-dimethylbenzyl)-1H-indole-6-carboxylic acid [69-3] The titled compound (883 mg) sdvfshd[69-3]3456 as a white solid was prepared"

dictionary= {[69-3]=1-(2,6-dimethylbenzyl)-1H-indole-6-carboxylic acid }

for(Map.Entry<String, String> entry : dictionary.entrySet()){

        text=text.replaceAll("\\b"+Pattern.quote(entry.getKey())+"\\b", entry.getValue());

} 

Solution

  • replaceAll takes as parameter a regular expression.

    In regular expressions, you have word boundaries : \b (use \\b in a string literal). They're the best way to ensure you're matching a word and not a part of a word : "\\bword\\b"

    But in your case, you can't use word boundaries as you're not looking for a word ([69-3] isn't a word).

    I suggest this :

    text=text.replaceAll("(?=\\W+|^)"+Pattern.quote("[69-3]")+"(?=\\W+|$)", ...
    

    The idea is to match a string end or something that's not a word. I can't ensure this will be the right solution for you though : such a pattern must be tuned knowing the exact complete use case.

    Note that if all your keys follow a similar pattern there might be a better solution than to iterate through a dictionary, you might for example use a pattern like "(?=\\W+|^)\\[\\d+\\-\\d+\\](?=\\W+|$)".