Search code examples
javahtmlparsinghtml-parsing

how to parse only HTML TAG with regex in java without jsoup


Hi all i need to parse only HTML TAG WITH REGEX, and left with non html tag with out jsoup

for example

<h1> i love india <\h1>
<xyz> name <\xyz>
<html> hey i won! <\html>
<syd> like it <\syd>
<<<<<<
<br> love you <br>  
>>>>>>>>

expected output is:

i love india
none
hey i won!
none
none
love you
none

i tried lot but not getting the exact answer any one help me out of this. thanks in advance.


Solution

  • Try the following:

            String[] array = { "<h1> i love india <\h1>",
                               "<xyz> name <\xyz>",
                               "<html> hey i won! <\html>",
                               "<syd> like i`enter code here`t <\syd>"
                            };
        Pattern pattern = Pattern.compile(">((.[^><]+))<");
        for (String str : array ) {
            Matcher m = pattern.matcher(str);
            if(m.find()) 
              System.out.println(m.group(1));
            else
              System.out.println("none");
        }