Search code examples
javaregexstringreplaceall

Java replaceAll regex With Similar Result


Alright folks, my brain is fried. I'm trying to fix up some EMLs with bad boundaries by replacing the incorrect

--Boundary_([ArbitraryName])

lines with more proper

--Boundary_([ArbitraryName])--

lines, while leaving already correct

--Boundary_([ThisOneWasFine])--

lines alone. I've got the whole message in-memory as a String (yes, it's ugly, but JavaMail dies if it tries to parse these), and I'm trying to do a replaceAll on it. Here's the closest I can get.

//Identifie bondary lines that do not end in --
String regex = "^--Boundary_\\([^\\)]*\\)$";
Pattern pattern = Pattern.compile(regex,
    Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Matcher matcher = pattern.matcher(targetString);
//Store all of our unique results.
HashSet<String> boundaries = new HashSet<String>();
while (matcher.find())
    boundaries.add(s);
//Add "--" at the end of the Strings we found.
for (String boundary : boundaries)
    targetString = targetString.replaceAll(Pattern.quote(boundary),
        boundary + "--");

This has the obvious problem of replacing all of the valid

--Boundary_([WasValid])--

lines with

--Boundary_([WasValid])----

However, this is the only setup I've gotten to even perform the replacement. If I try changing Pattern.quote(boundary) to Pattern.quote(boundary) + "$", nothing is replaced. If I try just using matcher.replaceAll("$0--") instead of the two loops, nothing is replaced. What's an elegant way to achieve my aim and why does it work?


Solution

  • There's no need to iterate through the matches with find(); that's part of what replaceAll() does.

    s = s.replaceAll("(?im)^--Boundary_\\([^\\)]*\\)$", "$0--");
    

    The $0 in the replacement string is a placeholder whatever the regex matched in this iteration.

    The (?im) at the beginning of the regex turns on CASE_INSENSITIVE and MULTILINE modes.