I have tried the below regular expression:
final String REG="\\Q[\\E((Bird)|(Animal)): .*\\Q]\\E";
System.out.println(input.replaceAll(REG," "));
to replace all "[Bird:*]"
and "[Animal:*]"
into empty space.
for example, given input
string
[Bird: Peacock] national bird [India], colorful. [Bird: Crow] crow is black [Animal: Cow] cow gives milk
actual output is:
cow gives milk
It matched [Bird:
to the last ]
of the given string. But, the expected result should be
national bird [India], colorful. crow is black cow gives milk
Can you anyone help on this?
*
quantifier is greedy by default so just like you noticed it will match maximal range of text like from [Bird:
to the last ]
. You can make it reluctant quantifier by adding ?
after it, so try with
final String REG="\\Q[\\E((Bird)|(Animal)): .*?\\Q]\\E";
// ^ - make `*` reluctant
You can also use second (and preferred technique - because of less backtracking) and instead of .
which accept any character (except line separators) use zero or more of not-]
-character like (which can be written as [^\\]]*
) which will give you
final String REG="\\Q[\\E((Bird)|(Animal)): [^\\]]*\\Q]\\E";
BTW it is easier to escape regex metacharacters [
and ]
by adding \\
before them. \\Q
and \\E
are nice if you want to escape large text which could contains many metacharacters. So consider rewriting your regex to something little shorter
final String REG="\\[(Bird|Animal): [^\\]]*\\]";
or even
final String REG="\\[(Bird|Animal): [^\\]]*\\]";
because ]
outside of character class is not actually metacharacter.
One more thing: consider removing one of the spaces which surround deleted [...]
token. This way you will prevent output from changing from "[xx] foo [xx] bar [xx] baz"
to " foo bar baz"
.
To do so you can also remove every space after your removed [ ]
(if such space exists). So just add \\s?
at the end of your regex which will give you
(lets hope) final version of regex
final String REG="\\[(Bird|Animal): [^\\]]*]\\s?";