Search code examples
mongodbpymongoflask-mongoengine

Sort MongoDB query by additional keyword


I am building a REST Api on Flask + MongoDB for a HR website. I need to filter search results by programming language, experience, city etc.

I also have a filter 'framework'. But I also need to implement an additional framework filter which is "nice to have, but not necessary" I just need to sort the items order including the additional framework.

The framework itself is listed in the 'about' field as a string, so I am using a regex to find it.

For example, I am looking for a Python developer who must have a Flask knowledge, but it is nice to know React too. So I need to have a search result like this:

  1. Dev A: Python, Flask, React
  2. Dev B: Python, Flask, React
  3. Def C: Python, Flask

The problem is that this second framework is not mandatory, when the first is so I can't query with '$or' or '$in'. The only option I see is that I query for the first framework and then sort it by presence of the second framework.

Is there a way to implement it in MongoDB?


Solution

  • OK, so after doing some research I was able to find a solution:

    def _filter_by_additional_framework(collection, query):
        results = collection.aggregate([
            {'$limit': 1},
            {'$lookup':
                {
                    'from': 'profiles',
                    'pipeline': [
                        {'$match': {
                            '$text': {
                                '$search': f"\"{query['framework']}\" \"{query['framework2']}\""}}
                        },
                        {'$sort': {'Rating': DESCENDING}}
                    ], 'as': 'col1'}},
            {'$lookup':
                {
                    'from': 'profiles',
                    'pipeline': [
                        {'$match': {
                            '$text': {
                                '$search': f"\"{query['framework']}\" -\"{query['framework2']}\""}}
                        },
                        {'$sort': {'Rating': DESCENDING}}
                    ], 'as': 'col2'}},
            {'$project': {'union': {'$concatArrays': ["$col1", "$col2"]}}},
            {'$unwind': '$union'},
            {'$replaceRoot': {'newRoot': '$union'}},
            {'$project': search_result_order()},
        ], collation=Collation(locale="en_US", numericOrdering=True))
    
        return results
    

    It performs 2 lookups by the same field in a collection (first one looking for say Flask AND React ('"Flask" "React"') and the second one Flask AND NOT React ('"Flask" -"React"'). I also had to add text indexes to some fields to perform a text lookup.

    Also, you may notice that there is a $sort expression in each pipeline. This is for additional sort (Rating in my case). And the second argument (collation) in aggregate is for sorting a string field by integer. So as a result you can sort every query as you would like to separately or you can move your sort expression out of the pipeline and sort the whole query combined.