Search code examples
javaregexassertionregex-lookaroundslookbehind

Regex capture lookbehind and lookahead


I'm trying to write regex for the following situations:

badword%
%badword
%badword%

The % signs differ, depending on where they are. A % at the front needs a lookbehind to match letters preceding the word badword until it reaches a non-letter. Likewise, any % that is not at the front needs a lookahead to match letters following the word badword until it hits a non-letter.

Here's what I'm trying to achieve. If I have the following:

Just a regular superbadwording sentece.

badword   # should match "badword", easy enough
badword%  # should match "badwording"
%badword% # should match "superbadwording"

At the same time. If I have a similar sentence:

Here's another verybadword example.

badword   # should match "badword", easy enough
badword%  # should also match "badword"
%badword% # should match "verybadword"

I don't want to use spaces as the assertion capture groups. Assume that I want to capture \w.

Here's what I have so far, in Java:

String badword  = "%badword%";
String _badword = badword.replace("%", "");
badword = badword.replaceAll("^(?!%)%", "(?=\w)"); // match a % NOT at the beginning of a string, replace with look ahead that captures \w, not working
badword = badword.replaceAll("^%", "(?!=\w)"); // match a % at the beginning of a string, replace it with a look behind that captures \w, not working
System.out.println(badword); // ????

So, how can I accomplish this?

PS: Please don't assume the %'s are forced to the start and end of a match. If a % is the first character, then it will need a look behind, any and all other %'s are look aheads.


Solution

  • From your question it doesn't seem necessary to use lookaround, so you could just replace all % with \w*

    Snippet:

    String tested = "Just a regular superbadwording sentece.";
    String bad = "%badword%";
    bad = bad.replaceAll("%", "\\\\w*");
    Pattern p = Pattern.compile(bad);
    Matcher m = p.matcher(tested);
    while(m.find()) {
        String found = m.group();
        System.out.println(found);
    }
    

    \w doesn't match #,-,etc. so I think \S is better here