Search code examples
javaregexstringparsingreplaceall

Replacing html tags with regex (Java)


Say you have a string that contains text from an html file and you do:

    content = content.replaceAll("<[^>]*>", "");

I know this will essentially remove all the html tags. However, if I want to keep tags which look like:

    <> or < (any type/amount of blank space here) >

is it possible to modify the replaceAll to accomplish that? If so, how? Thanks for any input/suggestions.


Solution

  • content = content.replaceAll("<[^>]*[^\\s>][^>]*>", "");
    

    That should match tags that have at least one non-whitespace character in them.