Search code examples
javashared-data

Best approach storing and accessing Java application data


I'm in the middle of a massive refactoring project, the code has a 5000 line main class which was injected into everything, stored everything and had all of the common code.

I'm no expert on analysis and design but I've separated out things to the best of my ability and I'm about 80% through refactoring the classes that depend on the main class to use the new classes I've created.

There are some types of data which are initialised when the application starts and accessed by pretty much everything throughout the life of the application. For instance there is a Config class which holds hundreds of parameters.

The approach I've taken is to create several singletons the two most central are GUIData and ClientData. GUIData contains a reference to the mainframe of the application and clientdata maintains references to the config and other similar classes.

This allows me to call ClientData.getInstance().getConfig().getParam("param") from anywhere in the code but I don't feel like this is the best approach.

I considered individual static classes instead of these data singletons which contain instances of the classes but some of the classes do need constructors.

I've been googling on and off for a week trying to find a better way to do this but somehow I always end up on threads talking about database caching


Solution

  • Immutable (configuration) instances provide "thread-safe application-wide data access". Typesafe's config (as suggested in a comment by Brian Kent) does exactly that. Note that this does not involve static classes or singletons. Static classes and singletons may serve your purposes now, but they could prove bothersome in the future. They can be handy ofcourse, but try limiting their use.

    Initialization will have to be done after reading and parsing the configuration data. It is typically done at application startup, before other processing threads are started. The initialization will have to validate the configuration data as much as possible in order to fail fast and terminate the program if the configuration data is no good.

    Having a lot of configuration data bundled together can create "hidden lines of communication". E.g. you update one value and the application fails because it required updates to other values as well. It's perfectly fine to put all configuration data in one file and load it from there, but your application (with hundreds of configuration options) should divide the configuration data in sets that are used by different parts of your application. This improves isolation, helps unit-testing and makes it possible to change the application in the future without getting too many nasty surprises.

    There are two ways to use a set of configuration data:

    1. from within an object call a singleton Settings.getInstance().getConfigForThisModule().
    2. provide each object that uses configuration data with the configuration data via the constructor or via setConfig(ConfigForThisModule config).

    The first approach depends on a convention not to call Settings.getInstance().getConfigForACompletelyUnrelatedModule() which could be a weakness. The second approach is more in line with "dependency injection" and could be more future proof. You could mix both approaches while you are refactoring, just make sure to be consistent (e.g. only use the singleton approach for configuration data that is used in all parts of the application).

    To further improve your design for using the configuration data, keep the following (likely) future functional requirement in mind: when the configuration file is updated, configuration data is reloaded and used in the application. Most logging frameworks manage to support this functional requirement without affecting the performance of multi-threaded applications. Among other things, it requires the following of your application:

    • if the new configuation data is no good, the program is not terminated but an error is logged instead and the old configuration data remains in use. Your initialization procedure will need to handle both "load at fresh start" and "reload" scenarios. The main thing to take away from this is that your initialization procedure needs to be re-usable and should not affect other (running) parts of your application (isolation, again).
    • long-lived objects may not keep a local copy of configuration data or a reference to an instance of ConfigForThisModule, instead Settings.getInstance()... (or some other method that can return an updated instance) should be called regurarly.
    • replacing old configuration with new configuration may not result in errors. Technically, replacing the configuration is as simple as updating an AtomicReference with a new configuration instance returned with Settings.getInstance().... But this is also where the isolation of the configuration data sets are tested: there should be no problem using an old set in one module and a new set in another module at the same.

    Configuration data can be seen as a sort of "global state". With that in mind, further design points on what to do and what to avoid (partially blatantly copied to this answer) are discussed in the following two questions: