Search code examples
javers

Potential scalability issue with Javers keeping data in a single table?


We were looking into different alternatives for storing object changes and found JaVers which seem to be tool built exactly for this purpose.

We have built a prototype (using MySQL for change repo) which has worked out well and delivered what has been promised. So far so good.

However it seems that JaVers is storing all its internal data in 4 tables. This is not a big problem for small datasets, but what happens if, say, original data schema has really large tables (millions/billions of records each)? Updating a record in such a large table would imply adding a record to JaVers audit table which would be massively large (most likely bigger than the size of the original database).

From our previous experience with large audit tables, we had problems like inserts starting to slow down, queries taking absolute ages and so on. We will need to get deltas out quite frequently too so this seems like a ticking time bomb.

1) Is it possible to configure JaVers so it stores changes in separate tables, one per each entity - something like

  • foo_global_id, foo_snapshot, foo_commit, foo_commit_property
  • bar_global_id, bar_snapshot, bar_commit, bar_commit_property

If it is not possible at the moment, how hard would it be to add such a feature (us willing to invest time and submit patches)?

2) Let's say we have

class Foo {
   String bar;
}

After some time we decide to add another field

class Foo {
   String bar;
   int baz = 0;
}

I suspect that if we update an instance of Foo and change bar only but keep baz = 0, JaVers will report change saying baz=0 has been added. Is there anything in JaVers which is designed to deal with data model changes and avoid such false positives?


Solution

  • Solution a) that you suggests it's not possible in JaVers SQL Repository. It would be very hard to implement. Think about implementing cross-class queries like child-value-objects-filter in SQL.

    In fact it would be some kind of sharding which is rather hard to achieve in SQL DB.

    For large databases we recommend using MongoDB (http://javers.org/documentation/repository-configuration/#mongodb-configuration). In MongoDB sharding is available out-of-the box at the DB level.

    Considering b) question. I wouldn't say it's a false positive. Objects: {'bar':'a'} and {'bar':'a', 'baz':0} are different. You can eliminate such changes if baz would be null (Integer).