Search code examples
cachingarchitecturememory-leaksinvalidation

Interdependent caches invalidation and memory management


I am working on a Java project which makes heavy use of Observer pattern to make sure every data object state is up to date. I grew tired of maintaining this mess and am trying to implement a solution that would decouple the terrors of Observer pattern from my precious data objects.

I was able to abstract away the details of this project to say that the problem I am trying to solve is following:

There exists a set of objects representing expressions, each of which can depend on the values of other expressions.

The following two operations are required:

eval(): Retrieve the value of a given expression

This operation should return the up to date value of the expression which would be returned if all the expression dependencies would re re-evaluated right now. However, no expression should be evaluated more than once unless its cache is invalidated by the second operation:

update(): Modify a given expression

This operation invalidates the cache for the expression and for all the currently cached expressions that depend on it directly or transitively.

In addition, some convenient memory-leak-free way to manage lifetime of the expressions is required.

Desired usage example in pseudo-code:

Expression a = variable(1);
Expression b = variable(3);
Expression s = sum(a,b);
assert(4 == eval(s));    // causes evaluation of expressions a, b and s
assert(4 == eval(s));    // does not cause any evaluations,
                         //     the result should be taken from cache
setValue(a,2);           // contains update() internally, 
                         //     invalidating caches for a and s
assert(5 == eval(s));    // causes evaluation of a and s

OK, the functional part is over, here goes the memory management part.

There must be some easy way for the developer to manage the expression graph. Ideally, the allocation should be done with new Sum(a,b), developer should have the freedom to pass around expression instances as he likes, without much knowledge about cache, and deallocation should happen automatically with no effort from the developer side.

And there must not be any memory leaks. That is when an expression is deallocated, there must not be anything left in the memory associated with it. For example, if observer pattern is to be used for invalidation, the expression must be removed from all observer lists.

The question is:

What would be your approach to implementing this in your favourite language?

Non-garbage-collected and functional languages are welcome too, especially functional because I don't understand how to approach this problem in pure functional at all.

The best solution from my point of view would be the one with the least possibility of developer error.

I am intentionally not posting my current implementation details yet because I think I have found a fundamental flaw in my implementation and I don't see any way around it. I will post it later though.


Solution

  • If anyone is interested (which nobody probably is anyway), I had to lay off the global cache idea and solved the problem by making my Expressions self-caching.

    I implemented all the logic in a base class called ExpressionBase.

    The solution includes the following:

    • An expression contains a list of weak references to its dependants and notifies them on change. This way there are no memory leaks and no need to unsubscribe.
    • During an expression evaluation it automatically detects dependencies in a way similar to described in my previous answer and subscribes to them.
    • The list of dependencies is kept to prevent too early garbage-collection of intermediate expressions (the SumProxyExpression case from my previous answer). This way every weak reference has its reverse strong counterpart so that chains of weak references are not broken by GC unless these chains lead nowhere.