Search code examples
algorithmlanguage-agnosticdependenciescycle

Is there an efficient algorithm to detect dependency cycles in complex data structures?


I have a smallish, complex database (a few millions of records split over very low thousands of tables). The records can be thought of as business rules. There is provision for users to define their own rules, in terms of existing rules (including other user defined rules). These rules are dependent on other rules, sometimes via complex paths. The dependencies form an extended network, rather than a hierarchy.

I am looking for an algorithm to determine, in a newly defined rule (or set of rules) whether the new rule is itself cyclic, or whether it creates cycles when taken together with existing rules.

I need an algorithm that is efficient in the following circumstances:

  1. The result of the algorithm needs only to be a boolean - true if there is a cycle, false otherwise.
  2. The existing database can be assumed to be cycle free.
  3. Processing can stop as soon as a cycle is found. The usual case (95% ??) will be that there is no cycle. Unfortunately, this is precisely the case where (I think) processing will have to complete all possible paths for the proposed new rule, in order to determine there is no cycle.
  4. This algorithm is to be used to validate new user defined rules, as they are entered into the database. It needs to be as quick as possible for the usual case - I don't want this validation to become a bottleneck in the creation process.
  5. Obtaining data is comparatively expensive - usually involving one or more queries, some of which are quite complex. The newly defined rule set can be constrained so as to be completely available in memory. If there are any other constraints that can be imposed on the input of new rules, that will aid the efficiency of this checking, I am not aware of what they may be.

EDIT

I am accepting Nick's answer, with one modification. Storing the dependencies is a very easy modification to the database. I am only going to store the direct dependencies rather than all dependencies whether direct or indirect. I can view the two sets of dependency C,D,F,G and X,Y,Z (in Nick's answer) as tree structures, and use one of the various techniques for deriving hierarchical structures from a single level dependency table. I think the cost of this will be acceptable in this context.

EDIT


Solution

  • I hope I understood your problem correctly:

    Lets assume you add rule A to the database, then you also add dependency information like A depends on C,D,F,G and X,Y,Z depend on A.

    I would assume there is no way of detecting a cycle at insertion time without really looking at the whole structure, which you say is disallowed.

    So my idea would be to have everything precomputed and stored, i.e. for each rule R store all other rules it depends on (not only directly, but also indirectly). Now when you insert rule A simply get all dependencies from C, D, F, G and see if they include any of X,Y,Z or A if they don't there is no cycle and you can safely add A to your ruleset and store all the dependencies from C, D, F, G plus C, D, F, G themselves as A's dependecies.

    This of course requires some restructuring (and rebuilding) of the database.