As with all my questions at the moment, I'm working with the "Advantage Database Server" on a remote machine, which is slow, and clumsy.
It would be great if I could write a quick script to dump changes made to the "live" system into a nice PostgreSQL database.
The existing database is made up of about 30 tables, however only about 7 of these are actively updated.
I have the ones I want copied defined as models already.
The ADS tables all have a pseudo-column of "ROWID" which should stay the same within the existing database (according to the documentation) ... this is also often used as the "Primary Key" on the ADS tables except for the fact that it isn't indexed!
I'm proposing to create a new table in PostgreSQL with a copy of this data, including the pseudo-column ROWID (not a PostgreSQL reserved word, I believe), and then doing a comparison of the live ADS data to the PostgreSQL equivalent.
class Determinand << AisBase
self.table_name = 'DETS'
self.sequence_name = :autogenerated
self.primary_key = 'DET'
end
class PgDeterminand << PostgresBase
self.sequence_name = :autogenerated
self.primary_key = 'DET'
end
livet = Determinand.select("ROWID").map(&:ROWID)
devt = PgDeterminand.select("ROWID").map(&:ROWID)
new_dets = Determinand.find_by(ROWID: livet - devt)
# or maybe
(livet - devt).map do |rid|
Determinand.find_by(ROWID: rid)
end
and then loop through the new_dets
to create new PgDeterminand
rows ...
the reading is very slow:
puts Benchmark.measure { livet=Determinand.select("ROWID").map(&:ROWID) }
0.196957 0.098432 0.295389 ( 26.503560)
livet.count
=> 6136
and this is not a big table ...
can anyone think of a clearer way to look at doing this?
-- EDIT --
Okay, I've copied all the existing models to an "Ads" folder, created new objects in the Postgres (based on the existing schema.rb file), removed all the belongs_to
from the models (no referential integrity on the AIS LIMS tables!) and I can quickly and easily copy the data to the new tables like follows:
def force_utf8 (hsh)
hsh.each_with_object({}) do |(i,j),a|
a[i]= j.present? && j.is_a?(String) ? j.encode("utf-8", invalid: :replace, undef: :replace, replace: '?') : j
end
end
Ads::Determinand.all.as_json.each do |d|
Determinand.create(force_utf8(d))
end
this isn't an incremental yet, but using the ROWID from the existing table, I should be able to work from there
-- EDIT 2 --
ROWID appears to be essentially sequential for each table ... except that it uses the order '[A-Za-z0-9+/]' ... awesome!
I was hoping to do just a "greater than last stored ROWID" for new data in the "Live" system:
Ads::Determinand.where(Ads::Determinand.arel_table['ROWID'].gt(Determinand.maximum(:ROWID))).as_json.each do |d|
Determinand.create(force_utf8(d))
end
but this obviously doesn't cope with ROWIDs after an ending "zz":
CFTquNARAXIFAAAezz
is greater than CFTquNARAXIFAAAe+D
Okay, I have this mostly sorted now:
first I moved all my models to an "Ads" directory (adding in "module Ads" to each model), set up 2 databases in my project and gathered the "existing" schema using rake db:schema:dump
then I created new models (e.g.):
rails g model Determinand
I then copied the existing model from the ads_schema.rb
to the rails migration, and rake db:migrate:postgres
I then did an initial data export/import.
On smaller tables, I was able to use the following:
Ads::Client.all.as_json.each do |c|
Client.create(c)
end
but on larger tables I had to use a CSV export from the ADS, and a pgloader
script to bring in the data:
load CSV
from 'RESULTS.csv'
having fields
(
SAMPNUM, DET, JOB, GLTHAN, INPUT, OUTPUT, RESULT, ERROR, GENFLAG,
SPECFLAG, STATFLAG, COMPFLAG, REPEAT, DETORDER, ANALYST, DETDATE [date format 'DD/MM/YYYY'],
DETTIME, LOGDATE [date format 'DD/MM/YYYY'], APPROVED [date format 'DD/MM/YYYY'], APPROVEDBY, INSTRUMENT, FILENAME, LINE_NO,
TEXTRESULT, DATATYPE, SUITE, TEST, SECTION, UKAS, MCERTS, ACCRED, DEVIATING,
PRINT_1, PRINT_1_BY, PRINT_1_AT, PRINT_2, PRINT_2_BY, PRINT_2_AT, LABEL, LABLOCN
)
into postgresql://$user:$password@localhost/ads_project
TARGET TABLE results
TARGET COLUMNS
(
'SAMPNUM', 'DET', 'JOB', 'GLTHAN', 'INPUT', 'OUTPUT', 'RESULT', 'ERROR', 'GENFLAG',
'SPECFLAG', 'STATFLAG', 'COMPFLAG', 'REPEAT', 'DETORDER', 'ANALYST', 'DETDATE',
'DETTIME', 'LOGDATE', 'APPROVED', 'APPROVEDBY', 'INSTRUMENT', 'FILENAME', 'LINE_NO',
'TEXTRESULT', 'DATATYPE', 'SUITE', 'TEST', 'SECTION', 'UKAS', 'MCERTS', 'ACCRED', 'DEVIATING',
'PRINT_1', 'PRINT_1_BY', 'PRINT_1_AT', 'PRINT_2', 'PRINT_2_BY', 'PRINT_2_AT', 'LABEL', 'LABLOCN'
)
with csv header,
fields optionally enclosed by '"',
fields terminated by ',',
drop indexes
before load do
$ alter table results alter column created_at drop not null, alter column updated_at drop not null; $$
after load do
$ update results set created_at = "DETDATE", updated_at=NOW() where created_at is null and updated_at is null; $,
$ alter table results alter column created_at set not null, alter column updated_at set not null; $
;
for the incremental updates I have to do something like the following:
On smaller tables (~ <1000 rows):
Ads::DetLimit.where.not(ROWID: DetLimit.pluck(:ROWID)).as_json.each do |d|
DetLimit.create(force_utf8(d))
end
On Larger tables I need to use Ruby to limit the IDs that have changed (essentially white-list not black-list):
zzz = SuiteDet.pluck(:ROWID)
yyy = Ads::SuiteDet.pluck(:ROWID)
Ads::SuiteDet.where(ROWID: yyy-zzz).as_json.each do |d|
SuiteDet.create(force_utf8(d))
end
I created a CopyTable
script to run, so that I can batch it, with just the increments now, and it takes about 2 minutes to run, which is acceptable