Search code examples
databaseunit-testingtestingentity-attribute-value

generating unit test data for a complex DB (EAV)


I'm working with a legacy application that makes some use of the well-known/dreaded data modeling pattern known as EAV. This has made choosing a data generation strategy to use during unit testing of the DAL difficult. Why? Because, in addition to the normal Fk/Pk constraints between tables (which we are using when possible), there are additional relationships/constraints that only the application layer is aware of and enforces.

According to this article, the easiest data tests to write and maintain are those that rely on an externally defined and static data set. However, it seems that attempting to create a dataset that incorporates the relationships already modeled in my application layer "by hand", would be a DRY violation and a major PITA too boot. On the other hand, using my application layer to generate test data feels even more distasteful as that violates unit testings' prime directive (isolation) since regression in the application layer can cause my DAL layer to throw bogus failures.

For this reason, I'm leaning towards the static dataset option, that is unless others who have had to deal with unit testing an EAV model can chime in with alternatives.

Many thanks.


Solution

  • If the DAL is not responsible for enforcing certain application rules in the data store, then there is no need to ensure that the test data conforms to those higher-level rules. The unit test need only verify that the DAL enforces the rules that are its responsibility -- presumably things like staying within the database constraints, data types, etc. The data need only conform to the preconditions of the DAL itself to constitute a valid test case. The higher level rules will be checked within the application layer's unit test in which the DAL will be stubbed or mocked out. Under these assumptions, either a static dataset or one generated using trivial code will likely be adequate for the DAL tests.

    It may well be that the "legacy" nature of the application makes it difficult if not impossible to unit test the application and DAL layers separately. In effect, the two layers collectively would be a single (if complex) "unit". In this case, it would be acceptable (or perhaps "tolerable" is the right word) to generate the test data using the application layer as a matter of expediency. Such generation would, in effect, constitute yet more test cases for the conglomerate "unit". DAL failures due to application layer regressions should be investigated as candidate bugs in one, the other or both layers. Any time spent attempting to pry these two layers apart into separate units would likely pay dividends in the long run.