Search code examples
pythonlucenepylucene

How do I use StandardAnalyzer with TermQuery?


I'm trying to produce something similar to what QueryParser in lucene does, but without the parser, i.e. run a string through StandardAnalyzer, tokenize this and use TermQuery:s in a BooleanQuery to produce a query. My problem is that I only get Token:s from StandardAnalyzer, and not Term:s. I can convert a Token to a term by just extracting the string from it with Token.term(), but this is 2.4.x-only and it seems backwards, because I need to add the field a second time. What is the proper way of producing a TermQuery with StandardAnalyzer?

I'm using pylucene, but I guess the answer is the same for Java etc. Here is the code I've come up with:

from lucene import *
def term_match(self, phrase):
    query = BooleanQuery()
    sa = StandardAnalyzer()               
    for token in sa.tokenStream("contents", StringReader(phrase)):
        term_query = TermQuery(Term("contents", token.term())
        query.add(term_query), BooleanClause.Occur.SHOULD)

Solution

  • The established way to get the token text is with token.termText() - that API's been there forever.

    And yes, you'll need to specify a field name to both the Analyzer and the Term; I think that's considered normal. 8-)