Search code examples

How to find similar documents

How do you find a similar documents of a given document in Lucene. I do not know what the text is i only know what the document is. Is there a way to find similar documents in lucene. I am a newbie so I may need some hand holding.


  • you may want to check the MoreLikeThis feature of lucene.

    MoreLikeThis constructs a lucene query based on terms within a document to find other similar documents in the index.

    Sample code example (java reference) -

    MoreLikeThis mlt = new MoreLikeThis(reader); // Pass the index reader
    mlt.setFieldNames(new String[] {"title", "author"}); // specify the fields for similiarity
    Query query =; // Pass the doc id 
    TopDocs similarDocs =, 10); // Use the searcher
    if (similarDocs.totalHits == 0)
        // Do handling