Search code examples
unit-testinglegacy

Tips for testing data intensive legacy application


I'm working on a very large, data-intensive legacy application. Both the code base & database are massive in scale. A great deal of the business logic is spread across all of the tiers including in stored procedures.

Does anybody have any suggestions on how to begin applying "unit" tests (technically integration tests because they need to test across tiers for a single aspect of almost any given process) into the application in an efficient way? The current architecture does not easily support any type of injection or mocking. New code is being written to facilitate testing, but what about the legacy code? Because of the strong dependency on the data itself and business logic in the database, I'm currently using inline sql to find data to use for testing but these are time consuming. Creating views and/or stored procedures will not suffice.

What approaches have you taken (if applicable)? What worked? What didn't & why? Any suggestions would be appreciated. Thanks.


Solution

  • Get a copy of Working Effectively with Legacy Code by Michael Feathers. It is full of useful advice for working with large, untested codebases.

    Another good book is Object Oriented Reengineering Patterns. Most of the book is not specific to object-oriented software. The full text is available for free download in PDF format.

    From my own experience: try to...

    • Automate the build and deployment
    • Get the database schema into version control, if it isn't yet. Usually databases include reference data that the transactional code needs to exist before it can work. Get this under version control too. Tools like dbdeploy can help you easily rebuild a schema and reference data from a sequence of deltas.
    • Install a version of the database (and any other infrastructure services) onto your development workstation. This will let you work on the database without continually having to go through DBAs. It's also faster than using a schema on a shared server in a remote datacentre. All major commercial database servers have free (as in beer) development versions that work on Windows (if you're stuck in the unenviable situation of developing on Windows and deploying on Unix).
    • Before starting work on an area of the code, write end-to-end tests that roughly cover the behaviour of the area you're working on. An end-to-end test should exercise the system from outside -- by controlling its user interface or interacting through network services -- so you won't need to change the code to put it into place. It will act as an (imperfect) regression test and give you more confidence to refactor the internals of the system towards a structure that is easier to unit test.
    • If there are manual test plans, read them and see what can be automated. Most manual test plans are almost entirely scripted and so are low-hanging fruit for automation
    • Once you've got end-to-end tests coverage, refactor the code into more loosely coupled units as you modify and/or extend it. Surround those units with unit tests.

    Things to avoid:

    • Copying data from the production database into the environment you use for automated testing. This will make your tests unpredictable. Sure, run the system against a copy of production data, but use that for exploratory testing, not regression testing.
    • Rolling back transactions at the end of tests to isolate tests from one another. This will not test behaviour that only happens when transactions are committed, and will throw away data that is valuable for diagnosing test failures. Instead, tests should ensure the database is in a known initial state when they start.
    • Creating a "tiny" data set for tests to run against. This makes tests hard to understand because they cannot be read as a single unit. The "tiny" data set soon grows very large as you add tests for different scenarios. Instead, tests can insert data into the database to set up the test-fixture.