language-agnostic orm design-patterns separation-of-concerns

Class should support an interface but this requires adding logic to the class in an intrusive way. Can we prevent this?

I have a C++ application that loads lots of data from a database, then executes algorithms on that data (these algorithms are quite CPU- and data-intensive that's way I load all the data before hand), then saves all the data that has been changed back to the database.

The database-part is nicely separate from the rest of the application. In fact, the application does not need to know where the data comes from. The application could even be started on file (in this case a separate file-module loads the files into the application and at the end saves all data back to the files).

Now:

the database layer only wants to save the changed instances back to the database (not the full data), therefore it needs to know what has been changed by the application.
on the other hand, the application doesn't need to know where the data comes from, hence it does not want to feel forced to keep a change-state per instance of its data.

To keep my application and its datastructures as separate as possible from the layer that loads and saves the data (could be database or could be file), I don't want to pollute the application data structures with information about whether instances were changed since startup or not.

But to make the database layer as efficient as possible, it needs a way to determine which data has been changed by the application.

Duplicating all data and comparing the data while saving is not an option since the data could easily fill several GB of memory.

Adding observers to the application data structures is not an option either since performance within the application algorithms is very important (and looping over all observers and calling virtual functions may cause an important performance bottleneck in the algorithms).

Any other solution? Or am I trying to be too 'modular' if I don't want to add logic to my application classes in an intrusive way? Is it better to be pragmatic in these cases?

How do ORM tools solve this problem? Do they also force application classes to keep a kind of change-state, or do they force the classes to have change-observers?

Solution

If you can't copy the data and compare, then clearly you need some kind of record somewhere of what has changed. The question, then, is how to update those records.

ORM tools can (if they want) solve the problem by keeping flags in the objects, saying whether the data has been changed or not, and if so what. It sounds as though you're making raw data structures available to the application, rather than objects with neatly encapsulated mutators that could update flags.

So an ORM doesn't normally require applications to track changes in any great detail. The application generally has to say which object(s) to save, but the ORM then works out what needs persisting to the DB in order to do that, and might apply optimizations there.

I guess that means that in your terms, the ORM is adding observers to the data structures in some loose sense. It's not an external observer, it's the object knowing how to mutate itself, but of course there's some overhead to recording what has changed.

One option would be to provide "slow" mutators for your data structures, which update flags, and also "fast" direct access, and a function that marks the object dirty. It would then be the application's choice whether to use the potentially-slower mutators that permit it to ignore the issue, or the potentially-faster mutators which require it to mark the object dirty before it starts (or after it finishes, perhaps, depending what you do about transactions and inconsistent intermediate states).

You would then have two basic situations:

I'm looping over a very large set of objects, conditionally making a single change to a few of them. Use the "slow" mutators, for application simplicity.
I'm making lots of different changes to the same object, and I really care about the performance of the accessors. Use the "fast" mutators, which perhaps directly expose some array in the data. You gain performance in return for knowing more about the persistence model.

There are only two hard problems in Computer Science: cache invalidation and naming things.

Phil Karlton