Search code examples
pythondjangosentence-similarity

Django: Filter Items similar to given Item


I have a django Backend (Postgre DB).

Suppose a given table, say A, has charfield called 'message'. Now, what I want to do is find all items in A which have similar 'message' to the 'message' field of a given instance. The similarity will be based on some algorithm. TL;DR I want to find items based on item-item similarity.

The question has 3 parts:

  1. How can I do it? Can I do it in real time (slow) or will I have to precompute similarity between all items in table A. (This might blow up my DB)

  2. How can I find similarity between 'message' fields? Note that the item is more like a 400 char post than a group of keywords. I've come across many algorithms that that calculate string distance, but I don't think that will cut it. I think something TF-IDF followed by cosine similarity is more appropriate.

  3. How do I achieve above in production setting? As in what data-structure should use to optimize request response time vs storage.


Solution

  • This might do the trick:

    http://django-haystack.readthedocs.org/en/v2.4.1/searchqueryset_api.html#more-like-this

    SearchQuerySet.more_like_this(self, model_instance)
    

    You can pass in an instance of the model, to fetch similar results.