I need to extract the noun phrases from the tree structure, but i am unable to extract the nouns from the tree structure using regex pattern.
Here is the tree structure
(TOP (ADJP (JJ welcome) (PP (TO to) (NP (NNP Regular) (NNP Expression) (NNS learnings)))))
I need to extract all the words which are pos tags like NP,NNP,NNS etc.i.e; i need to get the words like Regular,Expression,learnings using regex pattern.
Can some one please help me how to get this.
No sure if this is what you've wanted but this will extract those words for you:
Pattern regexpPattern = Pattern.compile("([A-Z]?[a-z]+)\\)");
Matcher m = regexpPattern.matcher("your string");
while (m.find()) {
System.out.println(m.group(1));
}