Search code examples
node.jsperformancemongoose

Multiple queries without duplicate results, limit them together and sort in custom priority


I'm struggling to find an efficient way to run multiple queries without duplicates and then limit them together without disrupting the manual sort order.

Let's say I have the following docs:

[
    { 'name': 'John1', 'keywords': [ 'john' ] },
    { 'name': 'John2', 'keywords': [ 'john' ] },
    { 'name': 'James1', 'keywords': [ 'james' ] },
    { 'name': 'James2', 'keywords': [ 'james' ] },
    { 'name': 'Jo1', 'keywords': [ 'jo' ] },
    { 'name': 'Jo2', 'keywords': [ 'jo' ] },
]

The first query is an exact match, ie. if the user searches 'John1' it will return the first doc.

User.findOne({ 'name': query })

The second query is a keyword match, ie. if the user searches 'John1', it will calculate a keyword to search of 'john' and then return the first 2 docs. (keywordSearch is pre-calculated and will reliably find the name within the query)

User.find({ 'keywords': keywordSearch })

The third query is a containing match, ie. if the user searches 'J', it will return all the listed docs.

User.find({ 'name': { '$regex' : query, '$options' : 'i' } })

Problem 1: Duplicates

If I query 'John1' I'll get the following results:

{ 'username': 'John1', 'keywords': [ 'john' ] },
{ 'username': 'John1', 'keywords': [ 'john' ] },
{ 'username': 'John2', 'keywords': [ 'john' ] },

I expect the amount of results in production to be too high to iterate through and remove duplicates from each array of results without serious performance issues.

Problem 2: Limiting

If I use .limit(), I have to limit the queries separately and that can produce an unpredictable amount of results. They need to be limited together for pagination.

Problem 3: Sorting

It's important that the results are displayed in the same order that the queries are run: exact match as the first result, keyword results as the second priority and then containing results as third priority. I imagine that if I were to use $or to select my 3 conditions, it would sort them based on a default field and disrupt my desired order.

Desired Results

Imagine that the exact match, keyword match & containing match are in one array of results, in the order displayed. I have separated & titled them below for visual aid.

Example query: 'John1'

Exact Match:

{ 'name': 'John1', 'keywords': [ 'john' ] }

Keyword Match:

{ 'name': 'John2', 'keywords': [ 'john' ] }

Containing Match: null

Example query: 'James2'

Exact Match:

{ 'name': 'James2', 'keywords': [ 'james' ] }

Keyword Match:

{ 'name': 'James1', 'keywords': [ 'james' ] }

Containing Match: null

Example query: 'Jo'

Exact Match: null

Keyword Match:

{ 'name': 'Jo1', 'keywords': [ 'jo' ] }
{ 'name': 'Jo2', 'keywords': [ 'jo' ] }

Containing Match:

{ 'name': 'John1', 'keywords': [ 'john' ] },
{ 'name': 'John2', 'keywords': [ 'john' ] }

Example query: 'J'

Exact Match: null

Keyword Match: null

Containing Match:

{ 'name': 'John1', 'keywords': [ 'john' ] },
{ 'name': 'John2', 'keywords': [ 'john' ] },
{ 'name': 'James1', 'keywords': [ 'james' ] },
{ 'name': 'James2', 'keywords': [ 'james' ] }
{ 'name': 'Jo1', 'keywords': [ 'jo' ] },
{ 'name': 'Jo2', 'keywords': [ 'jo' ] },

I hope that I have explained my scenario well enough and if not, please let me know any additional information that would help. Thanks a lot in advance.


Solution

  • My final solution was to use a page number based calculation and to use .where matches to exclude the results it had already found. ie. on keyword matches, I added a second condition that it isn't an exact match. On containing matches, I added a second and third condition that it isn't an exact match and it isn't a keyword match.

    On page 1 only, it searches for an exact match.

    It then counts the total keyword results and calculates how many pages worth there are, and if the current page is higher than that it fetches containing results. If the current page is equal to the amount of keyword result pages, it calculates how many of them are on the last page and if that's not equal to the page limit, it will also find the required amount of containing results to fill the page.

    This was really complex and I feel this didn't need to be anywhere near as difficult as it was, but ah well.