Search code examples
javalucenecompare

Compare field with exact same field in Lucene search with multiple fields


I'm new to Lucene and I'm trying to compare the fields 'title' and 'description' of my documents with the same field on the documents in a database.

The steps would be the following:

  1. Compare the field 'title' of the given document with the field 'title' of a document on my DDBB.
  2. Compare the field 'description' of the given document with the field 'description' of the same document on my DDBB.
  3. Combine both scores to create a similarity score for a given document on the DDBB.

And then a loop doing this with all the documents on the database and return the highest similarity score document.

Looking on Lucene's documentation I don't this this is the case when searching using MultiFieldQuery. MultiFieldQuery would compare the 'title' of the given document with both fields of the document in the database. Is this correct?

Is there any other form I've missed to do this comparation of each field?

Thanks in advance :D


Solution

  • After a bit of research, I found out that 'MultiFieldQueryParser' does compare field by field (wiki might be a bit confusing with the given example). This way, if you have 2 fields called 'title' and 'description', you can create a query like this:

    //Let's assume you already have the documents in your directory and you have 
    //your searcher and indexer already opened 
    String[] fields = {"title", "description"};
    String[] queries =  {document.getTitle(), document.getDescription()};
    Map<String, Float> boosts = Map.of("title", 1.0f, "description", 1.0f); //optional
    MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer,boosts);
    try {
         Query query = parser.parse(queries,fields,analyzer);
         .
         .
         .
    

    This way, the query will compare the values of the fields individually and then calculate the similarity using the boost factor given (in this case 1:1, so both fields weight the same in the comparison)