Search code examples
javaregexregex-lookaroundspunctuation

Regex and lookahead : java


I'm trying to remove punctuation except dots (to keep the sentence structure) from a String with regex Actually, i have no clue how it's working, i just code this :

public static String removePunctuation(String s){       
s = s.replaceAll("(?!.)\\p{Punct}" , " ");      
return s;
}

I found that we could use "negative lookahead" for this kind of problem, but when i run this code, it doesn't erase anything. The negative lookahead cancelled the \p{Punct} regex.


Solution

  • The . character has special meaning in regular expressions. It essentially means 'any character except new lines' (unless the DOTALL flag is specified, in which case it means 'any character'), so your pattern will match 'any punctuation character that is a new line character—in other words, it never match anything.

    If you want it to mean a literal . character, you need to escape it like this:

    s = s.replaceAll("(?!\\.)\\p{Punct}" , " ");      
    

    Or wrap it in a character class, like this:

    s = s.replaceAll("(?![.])\\p{Punct}" , " ");