How to keep track of performance testing

I'm currently doing performance and load testing of a complex many-tier system investigating the effect of different changes, but I'm having problems keeping track of everything:

There are many copies of different assemblies
- Orignally released assemblies
- Officially released hotfixes
- Assemblies that I've built containing further additional fixes
- Assemblies that I've build containing additional diagnostic logging or tracing
There are many database patches, some of the above assemblies depend on certain database patches being applied
Many different logging levels exist, in different tiers (Application logging, Application performance statistics, SQL server profiling)
There are many different scenarios, sometimes it is useful to test only 1 scenario, other times I need to test combinations of different scenarios.
Load may be split across multiple machines or only a single machine
The data present in the database can change, for example some tests might be done with generated data, and then later with data taken from a live system.
There is a massive amount of potential performance data to be collected after each test, for example:
- Many different types of application specific logging
- SQL Profiler traces
- Event logs
- DMVs
- Perfmon counters
The database(s) are several Gb in size so where I would have used backups to revert to a previous state I tend to apply changes to whatever database is present after the last test, causing me to quickly loose track of things.

I collect as much information as I can about each test I do (the scenario tested, which patches are applied what data is in the database), but I still find myself having to repeat tests because of inconsistent results. For example I just did a test which I believed to be an exact duplicate of a test I ran a few months ago, however with updated data in the database. I know for a fact that the new data should cause a performance degregation, however the results show the opposite!

At the same time I find myself sepdning disproportionate amounts of time recording these all these details.

One thing I considered was using scripting to automate the collection of performance data etc..., but I wasnt sure this was such a good idea - not only is it time spent developing scripts instead of testing, but bugs in my scripts could cause me to loose track of things even quicker.

I'm after some advice / hints on how better to manage the test environment, in particular how to strike a balance between collecting everything and actually getting some testing done at the risk of missing something important?

Solution

Scripting the collection of the test parameters + environment is a very good idea to check out. If you're testing across several days, and the scripting takes a day, it's time well spent. If after a day you see it won't finish soon, reevaluate and possibly stop pursuing this direction.

But you owe it to yourself to try it.