Search code examples
ruby-on-railsrubypostgresqlactiverecordactiverecord-import

Is there an efficient means of using ActiveRecord#Pluck for my situation?


I need to insert a lot of data into a new database. Like, a lot of data, so even nanoseconds count in the context of this query. I'm using activerecord-import to bulk-insert into Postgres, but that doesn't really matter for the scope of this question. Here's what I need:

I need an array that looks like this for each record in the existing DB:

[uuid, timestamp, value, unit, different_timestamp]

The issue is that the uuid is stored on the parent object that I'm looping through to get to this object, so #pluck works for each component aside from that. More annoying is that it is stored as an actual uuid, not a string, and needs to be stored as a uuid (not a string) in the new db as well. I'm not sure but I think using a SELECT inside of #pluck will return a string.

But perhaps the bigger issue is that I need to perform a conversion on the value of value before it is inserted again. It's a simple conversion, in effect just value / 28 or something, but I'm finding it hard to work that into #pluck without also tacking on #each_with_object or something (which slows this down considerably)

Here's the query as it is right now. It seems really silly to me to load the entire record based on the blockage outlined above. I hope there's an alternative.

Klass.find_each do |klass|
    Data.where(token: klass.token).find_each do |data|
      data << [
        klass.uuid,
        data.added_at,
        data.value / conversion,
        data.unit, 
        data.created_at
      ]
    end
end

And no, the parent and Data are not associated right now and it's not an option, so I can't eager-load or just call Klass.data (they will be linked after this transition).

So ideally this is what I'm looking for:

Data.where(token: klass.token).pluck(:added_at, :value, :unit, :created_at)

But with the parameters outlined above.


Solution

  • I wonder if you can combine a SQL JOIN with pluck:

    Klass
      .joins('INNER JOIN datas ON datas.token = klasses.token')
      .pluck('klasses.uuid', 'datas.added_at', "datas.value / #{conversion.to_f}", 'datas.unit', 'datas.created_at')