Search code examples
ruby-on-railspostgresqlcloneadvantage-database-server

Incrementally copy data from one (horrible) database to another (nicer) database in rails


As with all my questions at the moment, I'm working with the "Advantage Database Server" on a remote machine, which is slow, and clumsy.

It would be great if I could write a quick script to dump changes made to the "live" system into a nice PostgreSQL database.

The existing database is made up of about 30 tables, however only about 7 of these are actively updated.

I have the ones I want copied defined as models already.

The ADS tables all have a pseudo-column of "ROWID" which should stay the same within the existing database (according to the documentation) ... this is also often used as the "Primary Key" on the ADS tables except for the fact that it isn't indexed!

I'm proposing to create a new table in PostgreSQL with a copy of this data, including the pseudo-column ROWID (not a PostgreSQL reserved word, I believe), and then doing a comparison of the live ADS data to the PostgreSQL equivalent.

class Determinand << AisBase
  self.table_name = 'DETS'
  self.sequence_name = :autogenerated
  self.primary_key = 'DET'
end

class PgDeterminand << PostgresBase
  self.sequence_name = :autogenerated
  self.primary_key = 'DET'
end

livet = Determinand.select("ROWID").map(&:ROWID)
devt = PgDeterminand.select("ROWID").map(&:ROWID)

new_dets = Determinand.find_by(ROWID: livet - devt)
# or maybe
(livet - devt).map do |rid|
  Determinand.find_by(ROWID: rid)
end

and then loop through the new_dets to create new PgDeterminand rows ...

the reading is very slow:

puts Benchmark.measure { livet=Determinand.select("ROWID").map(&:ROWID) }
  0.196957   0.098432   0.295389 ( 26.503560)

livet.count
 => 6136

and this is not a big table ...

can anyone think of a clearer way to look at doing this?

-- EDIT --

Okay, I've copied all the existing models to an "Ads" folder, created new objects in the Postgres (based on the existing schema.rb file), removed all the belongs_to from the models (no referential integrity on the AIS LIMS tables!) and I can quickly and easily copy the data to the new tables like follows:

def force_utf8 (hsh)
  hsh.each_with_object({}) do |(i,j),a|
    a[i]= j.present? && j.is_a?(String) ? j.encode("utf-8", invalid: :replace, undef: :replace, replace: '?') : j
  end
end

Ads::Determinand.all.as_json.each do |d|
  Determinand.create(force_utf8(d))
end

this isn't an incremental yet, but using the ROWID from the existing table, I should be able to work from there

-- EDIT 2 --

ROWID appears to be essentially sequential for each table ... except that it uses the order '[A-Za-z0-9+/]' ... awesome!

I was hoping to do just a "greater than last stored ROWID" for new data in the "Live" system:

Ads::Determinand.where(Ads::Determinand.arel_table['ROWID'].gt(Determinand.maximum(:ROWID))).as_json.each do |d|
  Determinand.create(force_utf8(d))
end

but this obviously doesn't cope with ROWIDs after an ending "zz":

CFTquNARAXIFAAAezz is greater than CFTquNARAXIFAAAe+D


Solution

  • Okay, I have this mostly sorted now:

    Schema Initialisation

    first I moved all my models to an "Ads" directory (adding in "module Ads" to each model), set up 2 databases in my project and gathered the "existing" schema using rake db:schema:dump

    then I created new models (e.g.):

    rails g model Determinand
    

    I then copied the existing model from the ads_schema.rb to the rails migration, and rake db:migrate:postgres

    Initial Data Dump

    I then did an initial data export/import.

    On smaller tables, I was able to use the following:

    Ads::Client.all.as_json.each do |c|
      Client.create(c)
    end
    

    but on larger tables I had to use a CSV export from the ADS, and a pgloader script to bring in the data:

    load CSV
        from 'RESULTS.csv'
            having fields
            (
                SAMPNUM, DET, JOB, GLTHAN, INPUT, OUTPUT, RESULT, ERROR, GENFLAG,
                SPECFLAG, STATFLAG, COMPFLAG, REPEAT, DETORDER, ANALYST, DETDATE [date format 'DD/MM/YYYY'],
                DETTIME, LOGDATE [date format 'DD/MM/YYYY'], APPROVED [date format 'DD/MM/YYYY'], APPROVEDBY, INSTRUMENT, FILENAME, LINE_NO,
                TEXTRESULT, DATATYPE, SUITE, TEST, SECTION, UKAS, MCERTS, ACCRED, DEVIATING,
                PRINT_1, PRINT_1_BY, PRINT_1_AT, PRINT_2, PRINT_2_BY, PRINT_2_AT, LABEL, LABLOCN
            )
        into postgresql://$user:$password@localhost/ads_project
            TARGET TABLE results
            TARGET COLUMNS
            (
                'SAMPNUM', 'DET', 'JOB', 'GLTHAN', 'INPUT', 'OUTPUT', 'RESULT', 'ERROR', 'GENFLAG',
                'SPECFLAG', 'STATFLAG', 'COMPFLAG', 'REPEAT', 'DETORDER', 'ANALYST', 'DETDATE',
                'DETTIME', 'LOGDATE', 'APPROVED', 'APPROVEDBY', 'INSTRUMENT', 'FILENAME', 'LINE_NO',
                'TEXTRESULT', 'DATATYPE', 'SUITE', 'TEST', 'SECTION', 'UKAS', 'MCERTS', 'ACCRED', 'DEVIATING',
                'PRINT_1', 'PRINT_1_BY', 'PRINT_1_AT', 'PRINT_2', 'PRINT_2_BY', 'PRINT_2_AT', 'LABEL', 'LABLOCN'
            )
        with csv header,
            fields optionally enclosed by '"',
            fields terminated by ',',
            drop indexes
    
        before load do
            $ alter table results alter column created_at drop not null, alter column updated_at drop not null; $$
    
        after load do
            $ update results set created_at = "DETDATE", updated_at=NOW() where created_at is null and updated_at is null; $,
            $ alter table results alter column created_at set not null, alter column updated_at set not null; $
    ;
    

    Incremental Updates

    for the incremental updates I have to do something like the following:

    On smaller tables (~ <1000 rows):

    Ads::DetLimit.where.not(ROWID: DetLimit.pluck(:ROWID)).as_json.each do |d|
      DetLimit.create(force_utf8(d))
    end
    

    On Larger tables I need to use Ruby to limit the IDs that have changed (essentially white-list not black-list):

    zzz = SuiteDet.pluck(:ROWID)
    yyy = Ads::SuiteDet.pluck(:ROWID)
    
    Ads::SuiteDet.where(ROWID: yyy-zzz).as_json.each do |d|
      SuiteDet.create(force_utf8(d))
    end
    

    Deployment

    I created a CopyTable script to run, so that I can batch it, with just the increments now, and it takes about 2 minutes to run, which is acceptable