there are many ActiveRecord versioning gems available to Rails but most if not all of them are having trouble being maintained. on top of that, some of them seem to have various foreign key association issues.
I'm in the process of coding a content management system where pages are stored in a tree-like hierarchy and the page fields are stored in a separate table using EAV model.
keeping that in mind, I'm not looking for an all encompassing revisioning gem because I honestly don't think I'll find one. what I am looking for is some advice on how to handle this as a custom implementation. should I have a separate table for storing revisions and referring to a revision number in my EAV table? I foresee that this may lead to some complex validation problems though. I currently have a problem finding a clean way to validate a regular EAV table anyway so if anyone can comment on this it would be very much appreciated as well.
I hope this question is written well enough to SO standards. if you need any additional information, please do not hesitate to ask and I will try to help you help me. :)
I currently have a problem finding a clean way to validate a regular EAV table anyway so if anyone can comment on this it would be very much appreciated as well.
There isn't a clean way to either validate or constrain an EAV table. That's why DBAs call it an anti-pattern. (EAV starts on slide 16.) Bill doesn't talk about version, so I will.
Versioning looks simple, but it's not. To version a row, you can add a column. It doesn't really matter much whether it's a version number or a timestamp.
create table test (
test_id integer not null,
attr_ts timestamp not null default current_timestamp,
attr_name varchar(35) not null,
attr_value varchar(35) not null,
primary key (test_id, attr_ts, attr_name)
);
insert into test (test_id, attr_name, attr_value) values
(1, 'emp_id', 1),
(1, 'emp_name', 'Alomar, Anton');
select * from test;
test_id attr_ts attr_name attr_value
--
1 2012-10-28 21:00:59.688436 emp_id 1
1 2012-10-28 21:00:59.688436 emp_name Alomar, Anton
Although it might not look like it on output, all those attribute values are varchar(35). There's no simple way for the dbms to prevent someone from entering 'wibble' as an emp_id. If you need type checking, you have to do it in application code. (And you have to keep sleep-deprived DBAs from using the command-line and GUI interfaces the dbms provides.)
With a normalized table, of course, you'd just declare emp_id to be of type integer.
With versioning, updating Anton's name becomes an insert.
insert into test (test_id, attr_name, attr_value) values
(1, 'emp_name', 'Alomar, Antonio');
With versioning, selection is mildly complicated. You can use a view instead of a common table expression.
with current_values as (
select test_id, attr_name, max(attr_ts) cur_ver_ts
from test
-- You'll probably need an index on this pair of columns to get good performance.
group by test_id, attr_name
)
select t.test_id, t.attr_name, t.attr_value
from test t
inner join current_values c
on c.test_id = t.test_id
and c.attr_name = t.attr_name
and c.cur_ver_ts = t.attr_ts
test_id attr_name attr_value
--
1 emp_id 1
1 emp_name Alomar, Antonio
A normalized table of 1 million rows and 8 non-nullable columns has a million rows. A similar EAV table has 8 million rows. A versioned EAV table has 8 million rows, plus a row for every change to every value and every attribute name.
Storing a version number, and joining to a second table that contains the current values doesn't gain much, if anything at all. Every (traditional) insert would require inserts into two tables. What would be one row of 8 columns becomes 16 rows (8 in each of two tables.).
Selection is a little simpler, requiring only a join.