Search code examples
pythondjangopython-2.7django-querysetdjango-q

sorting a reduce query with multiple terms and Q filters


I am trying to make a search function that querys on multiple attributes from a model. To make matters a bit tougher I want to be able to do it with multiple terms inside a list comprehension then sort by the results that hit more accurately.

For example, if the serach terms were ['green', 'shoe'] and I had an object named 'green shoe' I would want that to be the first item in my result followed by 'black shoe' or 'green pants'.

Here is what I have so far that extracts the search terms from the query param and then runs the Q queries.

def get_queryset(self):
    search_terms = self.request.GET.getlist('search', None)
    terms = []
    x = [terms.extend(term.lower().replace('/', '').split(" ")) 
         for term in search_terms]
    # x is useless, but it is just better to look at. 
    results = reduce(operator.or_, 
                     (Item.objects.filter(Q(name__icontains=term) | 
                                          Q(description__icontains=term) | 
                                          Q(option__name__icontains=term)) 
                      for term in terms))
    return results

This would return ['black shoe', 'green pants', 'green shoe'] which is out of order, but it is all of the matching results.

I realize I could make it not split the search term up into multiple terms and would only get one result but then I wouldn't be getting other things that are similar either.

Thanks for looking

Edit 1

So after the first answer I started to play around with it. Now this produces the result I want, but I feel like it may be just terrible due to adding the query set to a list. Let me know what you think:

def get_queryset(self):
    search_terms = self.request.GET.getlist('search', None)
    if not search_terms or '' in search_terms or ' ' in search_terms:
        return []
    terms = [term.lower().replace('/', '').split(" ") for term in search_terms][0]
    results = reduce(operator.or_,
                     (Item.objects.filter
                      (Q(name__icontains=term) | Q(description__icontains=term) | Q(option__name__icontains=term))
                      for term in terms))

    # creating a list so I can index later
    # Couldn't find an easy way to index on a generator/queryset
    results = list(results)

    # Using enumerate so I can get the index, storing index at end of list for future reference
    # Concats the item name and the item description into one list, using that for the items weight in the result
    results_split = [t.name.lower().split() + t.description.lower().split() + list((x,)) for x, t in enumerate(results)]
    query_with_weights = [(x, len(search_terms[0].split()) - search_terms[0].split().index(x)) for x in terms]
    get_weight = lambda x: ([weight for y, weight in query_with_weights if y==x] or [0])[0]
    sorted_results = sorted([(l, sum([(get_weight(m)) for m in l])) for l in results_split], key=lambda lst: lst[1], reverse=True)

    # Building the final list based off the sorted list and the index of the items.
    final_sorted = [results[result[0][-1]] for result in sorted_results]
    print results_split
    print query_with_weights
    print final_sorted
    return final_sorted

A query of [red, shoes, pants] would print out this:

# Combined name and description of each item
[[u'red', u'shoe', u'sweet', u'red', u'shoes', u'bro', 0], [u'blue', u'shoe', u'sweet', u'blue', u'shoes', u'bro', 1], [u'red', u'pants', u'sweet', u'red', u'pants', u'bro', 2], [u'blue', u'pants', u'sweet', u'blue', u'pants', u'bro', 3], [u'red', u'swim', u'trunks', u'sweet', u'red', u'trunks', u'bro', 4]]

# Weighted query
[(u'red', 3), (u'shoes', 2), (u'pants', 1)]

# Final list of sorted items from queryset
[<Item: Red Shoe>, <Item: Red Pants>, <Item: Red Swim Trunks>, <Item: Blue Shoe>, <Item: Blue Pants>]

Solution

  • This is not exactly a QuerySet problem.

    This needs a separate algo that decides the ordering of the result set that you create. I would write a new algo that decides the ordering - possibly a whole array of algos because your results would depend on the category of the query itself.

    For now I can think of adding weight to the every result in the result set, deciding how close it is to the query done, based on some parameters.

    In your case, your parameters would be as follows:

    • How many words matched?
    • The words that appear first should get the highest priority
    • Any query that matches fully should have the highest priority as well
    • The words on the far end of the query should have lowest priority

    Anyways, that is an idea to begin with, I am sure you will have it much more complex perhaps.

    So here's the code for create the ordering:

    query = 'green shoe'
    query_with_weights = [(x, len(query.split()) - query.split().index(x)) for x in query.split()]
    results = ['black pants', 'green pants', 'green shoe']
    results_split = [res.split() for res in results]
    
    get_weight = lambda x: ([weight for y, weight in query_with_weights if y==x] or [0])[0]
    sorted_results = sorted([ (l, sum([( get_weight(m)) for m in l])) for l in results_split], key = lambda lst: lst[1], reverse=True)
    print('sorted_results={}'.format(sorted_results))
    

    Once you try this, you will get the following results:

    sorted_results=[(['green', 'shoe'], 3), (['green', 'pants'], 2), (['black', 'pants'], 0)]

    I hope this explains the point. However, this algo will only work for simple text. You might have to change your algo based on electrical items, for example, if your website depends on it. Sometimes you may have to look into properties of the object itself. This should be a good starter.