Search code examples
postgresqlselectruby-on-rails-5arelpg-query

Add a column to select for order_by in Postgresql


I have a complicated db query that selects schools based on real estate listing attributes, performance statistics for the schools, and the distance of each listing to public transit. Users create a Search object, and a method find_schools in the search.rb has this query:

         School.where(id: school_ids).narrow_schools_for_search(self,prop_type,status,year).joins(listings: 
         :cta_listings).joins(:performance_stats).where("cta_listings.distance <= ?", 
         self.cta_distance).where.not(performance_stats: {"#{sort_column.to_sym}" => 
         nil}).distinct.limit(30).order("performance_stats.#{sort_column} DESC")

School.rb
scope :narrow_schools_for_search, ->(search,prop_type,status,year) {joins(:listings).joins(:performance_stats)
     .where("listings.beds >= ?",search.beds).where("listings.price <= ?",search.max_price)
     .where("listings.price >= ?",search.min_price).where(listings: {prop_type: prop_type, status: status})
     .where(performance_stats: {year: year}).distinct}
  
  has_many :performance_stats, dependent: :destroy
  has_many :assignments, dependent: :destroy
  has_many :listings, through: :assignments

Listing.rb
has_many :assignments, dependent: :destroy
    has_many :schools, through: :assignments
    has_many :cta_listings, dependent: :destroy
    has_many :cta_stations, through: :cta_listings
    has_many :metra_listings, dependent: :destroy
    has_many :metra_stations, through: :metra_listings

PerformanceStat.rb
belongs_to :school

I need the Schools ordered by an attribute in the associated table PerformanceStats, which is a user defined attribute sort_column. The query works in development env (sqlite3), but fails on the staging app (PG) with this error:

ActiveRecord::StatementInvalid (PG::InvalidColumnReference: ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list

I need to add a select statement that includes the column name by which I am sorting schools.

The advice on other posts like this one is to do something like:

Widget.select('"widgets".*, "widget_steps.name"')

So, for my case, I tried this:

sort_for_select = "performance_stats.#{sort_column}"

    School.select('"schools".*, "#{sort_for_select"').where(id: school_ids).narrow_schools_for_search(self,prop_type,status,year).joins(listings: 
                 :cta_listings).joins(:performance_stats).where("cta_listings.distance <= ?", 
                 self.cta_distance).where.not(performance_stats: {sort_column.to_sym => 
                 nil}).distinct.limit(30).order("performance_stats.#{sort_column} DESC")

But my editor is indicating I'm not actually escaping to to ruby. I tried it anyway, and sure enough, it fails with

 ActiveRecord::StatementInvalid (PG::UndefinedColumn: ERROR:  column "#{sort_for_select}" does not exist.

Then I tried hard coding the sort_column:

School.select('"schools".*, "performance_stats.grall_adjpicalc"').where(id: school_ids).narrow_schools_for_search(self,prop_type,status,year).joins(listings: 
                     :cta_listings).joins(:performance_stats).where("cta_listings.distance <= ?", 
                     self.cta_distance).where.not(performance_stats: {grall_adjpicalc:
                     nil}).distinct.limit(30).order("performance_stats.grall_adjpicalc DESC")

This works in dev environment, but if fails on the staging app with this error:

ActiveRecord::StatementInvalid (PG::UndefinedColumn: ERROR:  column "performance_stats.grall_adjpicalc" does not exist

So at this point I have to deploy every time to test new ideas. I know PG in development would be ideal, but I lost an entire week trying to change over and couldn't get it to work. ended up losing everything and had to reseed from scratch.

I have 3 questions:

  1. What am I doing wrong with the Select statement?

  2. Is there another fast way to do this that avoids this issue? I was thinking instead of Distinct, perhaps I could go with uniq, converting to an array then sort the array accordingly.

  3. How can I get the variable sort_column into the select statement?

Any thoughts or suggestions are much appreciated!


Solution

  • Update: the working code in my original answer was slow, often ending up in timeouts on Heroku. I ended up with a query that operates 3x faster using arel_tables. The working code looks like this (self is an @search object). I'm not a professional coder, so without a doubt this could perform even faster, and I welcome any suggestions to speed up performance. It still takes 5-8 seconds depending on the query. But at least I'm not getting timed out anymore.

      school = School.arel_table
      pstat = PerformanceStat.arel_table
      schools = school.project(Arel.star).join(pstat).on(pstat[:school_id].eq(school[:id]).and(
              school[:area_id].in(self.area_ids).and(
                  pstat[:year].eq(year)
                )
              )
            )
       query = schools.to_sql
       school_ids = School.find_by_sql(query).pluck(:id)
    

    This gives me an array of possible school ids, now I need to find the listings in these schools that match the other search parameters, including max distance to public transit if selected. Schools and listings are joined via an Assignment model.

            listing = Listing.arel_table
            assignment = Assignment.arel_table
             if self.cta || self.metra
               cta_listing = CtaListing.arel_table
               metra_listing = MetraListing.arel_table
               
               listing_assign = listing.join(assignment).on(assignment[:listing_id].eq(listing[:id])).join(cta_listing, Arel::Nodes::OuterJoin).on(cta_listing[:listing_id].eq(listing[:id])).join(metra_listing, Arel::Nodes::OuterJoin).on(metra_listing[:listing_id].eq(listing[:id]))
               
               selected_listings = listing_assign.project(assignment[:school_id], listing[:id]).where(
               assignment[:school_id].in(school_ids).and(
               cta_listing[:distance].lteq(self.cta_distance).or(
               metra_listing[:distance].lteq(self.metra_distance))).and(
               listing[:prop_type].in(prop_type).and(
               listing[:status].in(status).and(
               listing[:beds].gteq(self.beds).and(
               listing[:active].eq(true).and(
               listing[:price].lteq(self.max_price).and(
               listing[:price].gteq(self.min_price))))))))
             else
               listing_assign = listing.join(assignment).on(assignment[:listing_id].eq(listing[:id]))
    
               selected_listings = listing_assign.project(assignment[:school_id], listing[:id]).where(
               assignment[:school_id].in(school_ids).and(
               listing[:prop_type].in(prop_type).and(
               listing[:status].in(status).and(
               listing[:beds].gteq(self.beds).and(
               listing[:active].eq(true).and(
               listing[:price].lteq(self.max_price).and(
               listing[:price].gteq(self.min_price))))))))
    
             end
    
           q = selected_listings.to_sql
           listings = Listing.find_by_sql(q)
    

    Now I have an AR Relation of all the listings that match the search. I can't limit to 30 schools before this point, because I'm not sure if a school will have any listings that match the requirements. Every school must have at lease one listing. I need to return the top 30 schools and their listings, so first I create an array of arrays with each listings id and the corresponding school_id.

           listings_array = listings.map{|x| [x.school_id,x.id]}
    

    Then I convert this array of arrays into a hash grouping by school_id:

           listings_hash = listings_array.group_by{|school_id| school_id.shift}.transform_values do |values|
       values.flatten.uniq
     end
    

    Now I can sort these schools by the selected column and pick the top 30.

     if sort_column.nil?
       sort_column = "grall_adjpicalc"
     end
    
     schools = School.where(id: listings_hash.keys).includes(:performance_stats).where(performance_stats: 
     {year: year}).order("performance_stats.#{sort_column} desc").limit(30)
    

    Now I have our top 30 schools and can return an array of arrays with each school and its corresponding listing_ids.

     schools_array = schools.map{|school| [school,listings_hash[school.id]]}
     return schools_array
    

    This is much longer that the previous answer, but at least 3x faster. Can you find a way to make this significantly faster?