Search code examples
ruby-on-railspostgresqlruby-on-rails-4common-table-expression

How to query entire table(3 tables with 100k records each) data and then display data from them in dashboard


I have 3 models, Postgresql db -

##user.rb - :full_name
has_many :enquiries

##enquiry.rb - :actual_target_count, :actual_visit_count
##there are different types of enquiry too - business, formal, phone, email etc
belongs_to :user

##sales_projection.rb - :target_month, desired_target_count, desired_visit_count
belongs_to :user
belongs_to :enquiry

Problem Statement - If I have 500+ users who have provided a target for current month (july 2023) with different types of enquiries (60k entries), how can I get a projection of a particular month to show all the users desired vs actual targets.

Let me elaborate this in more detail below -

records in users table (500+ entries)

{id: 1, full_name: "mike"}
{id: 2, full_name: "jason"}
{id: 3, full_name: "remi"}
{id: 4, full_name: "sam"}

records in enquiries table (500+ entries), every enquiry can have multiple entries of users

mike has worked on 5 enquiries, mike has worked on 20 enquiries and so on....

{id: 1, user_id: 1, actual_target_coun:30, actual_visit_count: 40}
{id: 2, user_id: 1, actual_target_coun:10, actual_visit_count: 05}
{id: 3, user_id: 1, actual_target_coun:20, actual_visit_count: 10}

records in sales_projection table (50k+ entries), every sales_projection record every users projections (actual vs desired targets count) for selected month(here its july 2023).

{id: 1, user_id: 1, desired_target_coun:30, desired_visit_count: 40, target_month: "July 2023"}
{id: 2, user_id: 2, desired_target_coun:10, desired_visit_count: 05, target_month: "June 2023"}
{id: 3, user_id: 3, desired_target_coun:20, desired_visit_count: 10, target_month: "July 2023"}
{id: 4, user_id: 4, desired_target_coun:50, desired_visit_count: 15, target_month: "March 2023"}

Based on the above set of records, how can I show a list of all the users and their desired/actual count for a selected month?

this is what i want to be shown:

user   |   desired target  | actual target | desired visit | actual visit | target_month

This is not possible with ActiveRecord (which i have currently) as this will be very time consuming and increase the page load time.

Currently this is what I have:

##pseudo code :)
SalesProjection.where(target_month: "July 2023").order(target_month: :asc).group_by { |m| m.user.id }.map do |key, value|
{
 get user full name,
 get user desired count using joins
 get user actual count using joins

}

end

The above logic takes 1+ mins to load the execute.

What I have tried but need more help:

  • Need help to try CTE (common table expressions) as I havent tried it but still trying but failing.
  • Direct way to write just one query with dynamic values to fetch and get the results in the desired output.

Solution

  • Your query strategy is slow b/c you instantiate many ActiveRecord objects. The key is to instantiate only the objects and fields of interest, and only do it once. So here is how you might structure your query

    class User < ApplicationRecord
      has_many :enquiries
      has_many :sales_projections # note: missing from OP
    
      def self.report
        select =<<-SEL
          users.full_name,
          sales_projections.target_month,
          sales_projections.desired_target_count,
          enquiries.actual_target_count,
          sales_projections.desired_visit_count,
          enquiries.actual_visit_count
        SEL
        User.
          select(select).
          joins(:enquiries, :sales_projections).
          where("sales_projections.target_month = ?", "July 2023").
          where("extract(month from enquiries.created_at)::integer = 7").
          where("extract(year from enquiries.created_at)::integer = 2023").
          where("enquiries.is_visited = ? and enquiries.is_qualified = ?", true, true).
          map do |u|
            [u.full_name,
             u.target_month,
             u.desired_target_count,
             u.actual_target_count,
             u.desired_visit_count,
             u.actual_visit_count]
        end
      end
    end
    
    User.report # => [["Olene", "July 2023", 82, 48, 80, 0],
    #    ["Kalyn", "July 2023", 10, 2, 26, 0],
    #    ["Drew", "July 2023", 7, 36, 53, 26],
    #    ["Terry", "July 2023", 99, 26, 44, 37],
    #    ["Olene", "July 2023", 82, 36, 80, 2],
    #    ["Sherlene", "July 2023", 71, 19, 79, 43],
    #    ["Delmer", "July 2023", 5, 48, 43, 20],
    #    ["Connie", "July 2023", 86, 1, 42, 34],
    #    ["Candis", "July 2023", 86, 17, 13, 9],
    #    ["Candis", "July 2023", 86, 11, 13, 21], ...etc.
    
    

    This should be much much faster than your query, I used 50 users, 5000 sales_projections, and 1000 enquiries and the query took 50mS. I didn't use any indexes... you should do that to get even better performance.

    Rather than querying the enquiries.created_at as between two dates, I suggest performance will be improved by querying the month and year values directly.

    I'll be curious to know how fast this performs with your data set.