I'm looking for java based tools for extracting relevant tags from a given article. I need a tool that will basically try and identify what are the main subjects and terms a given article is related to. Thanks for helping.
You can use HtmlUnit to parse the article's HTML and query for the parts of the document you are interested in searching. Then you can apply a simple algorithm of your own design to determine tags/keywords.
Like for instance, split()
the text on whitespace and then count how many times each word occurs. The words that occur the most (ignoring things like "and", "the", "if", etc.) are good candidates for keywords.