Search code examples
ruby-on-railsrubymongodbsinatrakaminari

Union two mongo results


I have a situation where I need to return a single collection of objects from mongo, but need to use two queries to get the results. The order of these results is important because they are paginated.

Here's the first query: (listings based on a category and a price range)

my_listings = MoListing.where(criteria_a)

The second query needs to use the results of the first query as a filter. So something like:

everything_else = MoListing.where(criteria_b)

Then union the results:

my_listings << everything_else

And finally, return paginated results:

my_listings.page(1).per(25)

It seems that part of my issue is that mongo queries are not executed until they are needed. Is there a way for me to trigger the execution of a query at a given point? Or is there another approach I should take in building this result set?

Update with more info

The behavior I'm seeing is that what gets returned is just the results in listings. I have also confirmed that everything_else does contain the expected records (48 records in my_listings, 52 in everything_else as expected).

When applying .all to my queries as mentioned in the comments, no impact is made. A puts listings.inspect results in

10:57:00 web.1   |    #<Mongoid::Criteria
10:57:00 web.1   |    selector: {"price"=>{"$gte"=>25, "$lte"=>75}},
10:57:00 web.1   |    options:  {},
10:57:00 web.1   |    class:    MoListing,
10:57:00 web.1   |    embedded: false>

However, listings.count does result in 48. Am I just missing some stupid simple way of merging these results? And once I do have the results in one collection, how will this impact the pagination functions that follow. I'm using kaminari for pagination.

Update 2

Per an answer below and my own trial and error, I've found to_a to be a solution, but not an ideal one. This does function:

#merge the results together as an Array
results = (listings.to_a | everything_else.to_a)

This causes the pagination via Kaminari to have to change as we are no longer working with mongo criteria, but instead, a standard Array. Here's the new pagination method:

Kaminari.paginate_array(results).page(page).per(per_page)

Working with a small dataset of 100 records, this is fine and dandy - 54ms

"debug":{"success":true,"pre_render_duration":54.808775999999995,"overall_duration":86.36554100000001,"count":25},"pagination":{"total_pages":4,"current_page":1}}

However, using a larger dataset I'm seeing significantly slower times when using the .to_a method to combine these. Although the examples are not exactly apples to apples, this large of a difference points to the issue being with to_a returning everything, forcing Kaminari to work with a lot more actual data:

My results without to_a, simply returning all records with criteria applied - 15ms

"debug":{"success":true,"pre_render_duration":15.107164,"overall_duration":18.267599,"count":25},"pagination":{"total_pages":81,"current_page":1}}

My results with to_a, merging two resultsets - 415ms

"debug":{"success":true,"pre_render_duration":415.258199,"overall_duration":450.66537800000003,"count":25},"pagination":{"total_pages":81,"current_page":1}}

To summarize, this is not a valid option. Returning each dataset individually takes <15ms even with a large dataset, so I think what I need to accomplish is a way to merge the criteria together so that a single query is run against Mongo, allowing pagination to happen on the db, where it should be.

In SQL I would do something roughly like

select
  *
from
  listings
where
  field = "blah"
union all
select
  *
from
  listings
where
  field <> "blah"

Is it possible to do this in Mongo?


Solution

  • Maybe you could create a class to encapsulate details on how data is retrieved for that specific array and, by using Mongo driver, you could skip and limit query options to reduce transferred data sizes.

    Using this approach, you could use something like this (namings not very good and i didnt tested the code, but you'll take the point):

    class MoListingDataRetriever
      def initialize(page_size)
        @page_size = page_size / 2 #since you'll have two queries
        driver_instance = MoListing.db #just an exemple. You could use any of your classes that are mongo documents to do this
        @collection_driver = driver_instance.collection("mo_listing") #or whatever you collection name is on mongo
      end
    
      def retrieve_mo_listings(query_param_a, query_param_b, current_page)
        query_options = {
          limit: @page_size,
          page: current_page,
          skip: (@page_size * (current_page - 1)) #to skip a number of records already retrieved from the query
        }
        results_from_query_a = @driver_instance.find(query_param_a, query_options)
        results_from_query_b = @driver_instance.find(query_param_a, query_options)
        results_from_query_b.to_a.concat(results_from_query_b.to_a)    
      end
    end