Search code examples
ruby-on-rails-4jrubydata-warehousemssql-jdbc

how to get has_many through: across servers working


I have a JRuby application that I am trying to create has_many through: relationships for databases that live on two completely different servers. I understand that joins will not work across tables on different servers. What I would like is to simulate the joins so that a developer using the model doesn't have to be (as) aware of the cross-server join.

There are some additional complexities in this setup:

  • The remote database is read-only
  • The table names and primary keys in the remote database do not follow rails naming conventions. (The remote database is a Data Warehouse)
  • I would like to be able to use the model as though a has_and_belongs_to_many was in it.

I've considered writing my own custom association, but that's a bit complex and I can't find any guides or really any starting points other then reading through the Rails code.

Is there an easy way to do this that I'm missing?

Is building a custom ActiveRecord association the best way to do this? and if so, where do I start?

Code similar to my setup:

config/database.yml

development:
  adapter: postgresql
  encoding: unicode
  database: main
  username: username
  password: password
  host: localhost
  pool: 5

remote_development: # Read only
  adapter: jdbcmssql
  driver: com.microsoft.sqlserver.jdbc.SQLServerDriver
  url: 'jdbc:sqlserver://foo.com;databaseName=main'
  username: username
  password: password

app/models/account.rb

class Portfolio < ActiveRecord::Base
  #has_and_belongs_to_many :dim_users, join_table: :accounts_dim_user
end

app/models/remote_model_base.rb

class RemoteModelBase
  require "#{Rails.root}/lib/sqljdbc4.jar"
  self.abstract_class = true
  establish_connection "remote_#{Rails.env}".to_sym
  after_initialize :readonly!
end

app/models/dim_user.rb

class DimUser < RemoteModelBase
  self.table_name = 'DimUser'
  self.primary_key = 'dwidDimUser'

  #has_and_belongs_to_many :accounts, join_table: :accounts_dim_user
end

config/schema.rb

ActiveRecord::Schema.define(version: 20140925200106) do

  create_table "accounts", force: true do |t|
    t.string   "name"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  create_table "accounts_dim_user", force: true, id: false do |t|
    t.integer  "dwidUser"
    t.integer  "account_id"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  # Defined in the remote database but it might look something like this
  # create_table "DimUser" do |t|
  #   t.integer dwidUser
  #   # ...
  # end

Solution

  • Just ran through a quick scenario for you, here's the repo: https://github.com/beneggett/many_db_example

    In the repo, I just did 2 different db's on my local, but it won't matter, the principal is the same:

    This seemed to work well for me:

    Tell the account about the account_dim_users join table association, but manually map the has_many through/habtm.

    class Account < ActiveRecord::Base
      has_many :account_dim_users
    
      def dim_users
        account_dim_users.map {|account_dim_user| DimUser.find_by(dwidUser: account_dim_user.dwidUser) }
      end
    end
    

    This is important, because, as you well know, the standard join will not work; however mapping it through the model works fine.

    AccountDimUser join table looks standard (I explicitly mapped the keys)

    class AccountDimUser < ActiveRecord::Base
      has_many :accounts
      has_many :dim_users, primary_key: :dwidUser, foreign_key: :dwidUser
    
    end
    

    Manually map the account_dim_users association, and manually map the accounts association

    class DimUser < ActiveRecord::Base
      establish_connection "other_db".to_sym
      after_initialize :readonly!
      self.table_name = 'DimUser'
      self.primary_key = 'dwidUser'
    
      def account_dim_users
        AccountDimUser.where(dwidUser: self.dwidUser)
      end
    
      def accounts
        account_dim_users.map {|account_dim_user| Account.find(account_dim_user.account_id) }
      end
    end
    

    This approach allows you to still use your Ruby objects in the standard way:

    a = Account.first
      Account Load (0.6ms)  SELECT  "accounts".* FROM "accounts"   ORDER BY "accounts"."id" ASC LIMIT 1
    => #<Account:0x00000102d263d0> {
              :id => 1,
            :name => "New account",
      :created_at => Mon, 29 Sep 2014 15:07:07 UTC +00:00,
      :updated_at => Mon, 29 Sep 2014 15:07:07 UTC +00:00
    }
    

    --

    a.account_dim_users
    => #<ActiveRecord::Associations::CollectionProxy [#<AccountDimUser id: 1, dwidUser: 1, account_id: 1, created_at: "2014-09-29 15:08:47", updated_at: "2014-09-29 15:08:47">, #<AccountDimUser id: 3, dwidUser: 5, account_id: 1, created_at: "2014-09-29 15:24:17", updated_at: "2014-09-29 15:25:06">]>
    

    --

    a.dim_users
      AccountDimUser Load (0.3ms)  SELECT "account_dim_users".* FROM "account_dim_users"  WHERE "account_dim_users"."account_id" = $1  [["account_id", 1]]
      DimUser Load (0.9ms)  SELECT  "DimUser".* FROM "DimUser"  WHERE "DimUser"."dwidUser" = 1 LIMIT 1
      DimUser Load (0.3ms)  SELECT  "DimUser".* FROM "DimUser"  WHERE "DimUser"."dwidUser" = 5 LIMIT 1
    => [
      [0] #<DimUser:0x0000010981af10> {
                :id => 1,
          :dwidUser => 1,
        :created_at => Mon, 29 Sep 2014 15:06:44 UTC +00:00,
        :updated_at => Mon, 29 Sep 2014 15:06:44 UTC +00:00
      },
      [1] #<DimUser:0x00000109838b00> {
                :id => 5,
          :dwidUser => 5,
        :created_at => Mon, 29 Sep 2014 15:23:01 UTC +00:00,
        :updated_at => Mon, 29 Sep 2014 15:23:01 UTC +00:00
      }
    ]
    

    --

    d = DimUser.first
      DimUser Load (0.5ms)  SELECT  "DimUser".* FROM "DimUser"   ORDER BY "DimUser"."dwidUser" ASC LIMIT 1
    => #<DimUser:0x0000010990aad8> {
              :id => 1,
        :dwidUser => 1,
      :created_at => Mon, 29 Sep 2014 15:06:44 UTC +00:00,
      :updated_at => Mon, 29 Sep 2014 15:06:44 UTC +00:00
    }
    

    --

    d.account_dim_users
      AccountDimUser Load (0.5ms)  SELECT "account_dim_users".* FROM "account_dim_users"  WHERE "account_dim_users"."dwidUser" = 1
    => #<ActiveRecord::Relation [#<AccountDimUser id: 1, dwidUser: 1, account_id: 1, created_at: "2014-09-29 15:08:47", updated_at: "2014-09-29 15:08:47">]>
    

    --

     d.accounts
      AccountDimUser Load (0.5ms)  SELECT "account_dim_users".* FROM "account_dim_users"  WHERE "account_dim_users"."dwidUser" = 1
      Account Load (0.4ms)  SELECT  "accounts".* FROM "accounts"  WHERE "accounts"."id" = $1 LIMIT 1  [["id", 1]]
    => [
      [0] #<Account:0x000001099788d0> {
                :id => 1,
              :name => "New account",
        :created_at => Mon, 29 Sep 2014 15:07:07 UTC +00:00,
        :updated_at => Mon, 29 Sep 2014 15:07:07 UTC +00:00
      }
    ]
    

    There is probably some optimizations that could be done to this when dealing with large quantities of records, but it's a good base.

    The other approach might be to do lookups on the association table itself, like:

      def find_dim_user
        DimUser.find_by(dwidUser: self.dwidUser)
      end
    

    But I very much prefer the first way I suggested, as it lets you do your normal ruby method chaining approach to associations.

    Any other questions, let me know!

    Edit: You could change the map function to use Active Record Relations or similar as well, enabling more functionality:

    class Account < ActiveRecord::Base
      has_many :account_dim_users
    
      def dim_users
        dim_user_ids = account_dim_users.map {|account_dim_user| account_dim_user.dwidUser }
        DimUser.where(dwidUser: dim_user_ids)
      end
    end