Spring Data JDBC deletes all entries and inserts them again

I am currently playing around with the Spring Data Jdbc framework and I don't understand the following behaviour: When I have an aggregate root that stores a list of children and I add one child, it deletes all children and inserts them again.

This means, when my aggregate root stores 5000 entries of another entity and I add one entry, then it executes 5000 delete statements and 5001 insert statements.

Could that lead to performance issues? Is there a clean way to avoid this behaviour?

Solution

Yes this is the current behaviour. Yes if you have an aggregate with 5000 entries this will cause performance problems.

But if you have an aggregate with 5000 entries, you are using aggregates wrong. Aggregates are intended to be used as an atomic unit. An aggregate gets loaded as one thing and it gets persisted by one thing. No matter what you do, this is going to be slow. A domain like this should be modelled as two separate aggregates.

Of course, even with smaller aggregates, this comes at a cost. If the cost matters and inside your application you know a better behaviour, you can and should implement that in a custom method. For example you could have an addEntry method, that performs just the single insert for the example you gave in the question.

For the implicit question: Why is Spring Data doing this thing? The answer is "simplicity". In order to avoid this basic approach one would need to keep track what actually changes in an aggregate. This is basically the approach JPA chose. It means you'll need to keep a reference to the aggregate. You'll have to deal with the cases when one entity is represented by multiple instances in your application. These are the things that make JPA so complex. Spring Data JDBC takes a different approach. It basically goes: "I have no idea what state the database is in right now, but once I'm done with the persist operation it has to match the state of the aggregate inside the application"

There are plans and ideas to improve the behaviour. We are currently introducing quite some batch operations, which have the potential to speed up things. There is also a ticket to do a "delete/upsert" operation instead, which should perform better. But it will still touch all rows.