I am trying to make a search function that querys on multiple attributes from a model. To make matters a bit tougher I want to be able to do it with multiple terms inside a list comprehension then sort by the results that hit more accurately.
For example, if the serach terms were ['green', 'shoe']
and I had an object named 'green shoe'
I would want that to be the first item in my result followed by 'black shoe'
or 'green pants'
.
Here is what I have so far that extracts the search terms from the query param and then runs the Q queries.
def get_queryset(self):
search_terms = self.request.GET.getlist('search', None)
terms = []
x = [terms.extend(term.lower().replace('/', '').split(" "))
for term in search_terms]
# x is useless, but it is just better to look at.
results = reduce(operator.or_,
(Item.objects.filter(Q(name__icontains=term) |
Q(description__icontains=term) |
Q(option__name__icontains=term))
for term in terms))
return results
This would return ['black shoe', 'green pants', 'green shoe']
which is out of order, but it is all of the matching results.
I realize I could make it not split the search term up into multiple terms and would only get one result but then I wouldn't be getting other things that are similar either.
Thanks for looking
Edit 1
So after the first answer I started to play around with it. Now this produces the result I want, but I feel like it may be just terrible due to adding the query set to a list. Let me know what you think:
def get_queryset(self):
search_terms = self.request.GET.getlist('search', None)
if not search_terms or '' in search_terms or ' ' in search_terms:
return []
terms = [term.lower().replace('/', '').split(" ") for term in search_terms][0]
results = reduce(operator.or_,
(Item.objects.filter
(Q(name__icontains=term) | Q(description__icontains=term) | Q(option__name__icontains=term))
for term in terms))
# creating a list so I can index later
# Couldn't find an easy way to index on a generator/queryset
results = list(results)
# Using enumerate so I can get the index, storing index at end of list for future reference
# Concats the item name and the item description into one list, using that for the items weight in the result
results_split = [t.name.lower().split() + t.description.lower().split() + list((x,)) for x, t in enumerate(results)]
query_with_weights = [(x, len(search_terms[0].split()) - search_terms[0].split().index(x)) for x in terms]
get_weight = lambda x: ([weight for y, weight in query_with_weights if y==x] or [0])[0]
sorted_results = sorted([(l, sum([(get_weight(m)) for m in l])) for l in results_split], key=lambda lst: lst[1], reverse=True)
# Building the final list based off the sorted list and the index of the items.
final_sorted = [results[result[0][-1]] for result in sorted_results]
print results_split
print query_with_weights
print final_sorted
return final_sorted
A query of [red, shoes, pants]
would print out this:
# Combined name and description of each item
[[u'red', u'shoe', u'sweet', u'red', u'shoes', u'bro', 0], [u'blue', u'shoe', u'sweet', u'blue', u'shoes', u'bro', 1], [u'red', u'pants', u'sweet', u'red', u'pants', u'bro', 2], [u'blue', u'pants', u'sweet', u'blue', u'pants', u'bro', 3], [u'red', u'swim', u'trunks', u'sweet', u'red', u'trunks', u'bro', 4]]
# Weighted query
[(u'red', 3), (u'shoes', 2), (u'pants', 1)]
# Final list of sorted items from queryset
[<Item: Red Shoe>, <Item: Red Pants>, <Item: Red Swim Trunks>, <Item: Blue Shoe>, <Item: Blue Pants>]
This is not exactly a QuerySet problem.
This needs a separate algo that decides the ordering of the result set that you create. I would write a new algo that decides the ordering - possibly a whole array of algos because your results would depend on the category
of the query itself.
For now I can think of adding weight to the every result in the result set, deciding how close it is to the query done, based on some parameters.
In your case, your parameters would be as follows:
Anyways, that is an idea to begin with, I am sure you will have it much more complex perhaps.
So here's the code for create the ordering:
query = 'green shoe'
query_with_weights = [(x, len(query.split()) - query.split().index(x)) for x in query.split()]
results = ['black pants', 'green pants', 'green shoe']
results_split = [res.split() for res in results]
get_weight = lambda x: ([weight for y, weight in query_with_weights if y==x] or [0])[0]
sorted_results = sorted([ (l, sum([( get_weight(m)) for m in l])) for l in results_split], key = lambda lst: lst[1], reverse=True)
print('sorted_results={}'.format(sorted_results))
Once you try this, you will get the following results:
sorted_results=[(['green', 'shoe'], 3), (['green', 'pants'], 2), (['black', 'pants'], 0)]
I hope this explains the point. However, this algo will only work for simple text. You might have to change your algo based on electrical items, for example, if your website depends on it. Sometimes you may have to look into properties of the object itself. This should be a good starter.