Search code examples
azure-cognitive-search

Is it possible to get a list of similar and/identical documents?


This is a general question that would like to get some input from the search community, so I don't have a piece of code to share just yet.

The objective is for a single document to get a list of similar and/or identical documents indexed by Azure Search - is that possible?

So given a document_id = 1 how do I get a list of the most similar documents to the specified id in the index? Ideally the outcome would be a list of documents order by a match of 0-100 - where 100 (%) would be an identical match.

I considering maybe taking the content of a given document and submitting that as part of the search, but that doesn't seem to be very elegant and it is also error prone in terms of constructing the query and the size of a document can be significant.

Thank you in advance for any suggestions or comments.


Solution

  • You could try using the preview feature "moreLikeThis" -> https://learn.microsoft.com/en-us/azure/search/search-more-like-this

    I believe that's the closest Azure Search has to offer to what you want.

    Edit 1: Be advised that this feature has limitations like non-support for complex types. Make sure it meets your requirements before taking a production dependency.