Search code examples
ruby-on-railsactiverecord

Whats the difference between `joins` and `references` in Rails?


What is the difference between this:

User.includes(:sites).where("sites.name = 'test'").references(:sites)

and this:

User.joins(:sites).where(sites: { name: "test"} )

Is there a general consensus on using one method over the other?


Solution

  • They do completely different things.

    .joins(:sites) adds a INNER JOIN to the relation which limits the results to users with at least one site before the where clause is applied.

    No columns off the joined table will be selected (unless you do so manually). Use it when you just want to filter the results with a join but don't care about the data in the related table or with .select to select aggregates.

    It does not prevent N+1 queries when you access the association:

    users = User.joins(:sites).where(sites: { name: "test" } )
    # SELECT "users".* 
    # FROM "users" 
    # INNER JOIN "sites" 
    #   ON "sites"."user_id" = "users"."id" 
    # WHERE "sites"."name" = 'test'
    
    users.each do |user|
      user.sites.each do |site|
        puts site.name # creates an additional database query for every site
      end
    end 
    

    .includes avoids N+1 queries in a "smart" way. It's extremely overused and you're usually better off just thinking about how you want to use the data and using the methods it delegates to directly.

    It will try to use .preload to do two queries where the first would select the data off the users table and the second query will select data off the sites tables when the assocation is first accessed on any member of the collection. This will avoid loading the data unless it's actually used while at the same time preventing a N+1 query for every member of the association.

    users = User.includes(:sites) # SELECT "users".* FROM "users" 
    users.each do |user|
      user.sites.each do |site|
        puts site.name # only one additional database to fetch the sites is fired
      end
    end
    

    However if the query references the sites table through the where clause .preload won't work as it's not actually joining anything. .includes is actually smart enough to detect this in most cases and will use .eager_load to load everything in a single query with an outer join.

    .references explicitly tells .includes to use .eager_load and in that example will prevent a database driver that occurs when the query doesn't join sites.

    User.includes(:sites).where("sites.name = 'test'")
    #  User Load (0.8ms)  SELECT "users".* FROM "users" WHERE (sites.name = 'test') /* loading for pp */ LIMIT 11 /*application='Sandbox8'*/
    # An error occurred when inspecting the object: #<ActiveRecord::StatementInvalid:"SQLite3::SQLException: no such column: sites.name:\nSELECT \"users\".* FROM \"users\" WHERE (sites.name = 'test') /* loading for pp */ LIMIT 11 /*application='Sandbox8'*/\n     
    

    But if you just pass a hash to .where instead of actively sabotaging it with a SQL string for the sake of the example it does just fine:

    User.includes(:sites).where(sites: { name: "test" })
    

    It's even more pointless if you consider that you could just cut out the middle man instead of being so indecisive.

    # This runs just fine
    User.eager_load(:sites).where("sites.name = 'test'")
    
    # But this is preferable
    User.eager_load(:sites).where(sites: { name: "test" })
    

    I guess .references could still fill a very niche role if you're building a query programatically and want to use the lazy loading nature of .includes but at the same time avoid a reference error (this scenario took a lot of imagination). But in general - YAGNI.