Search code examples
rubytopological-sort

Find and sort topologically all trees including a given group of nodes from a dependency graph


Assume I have the following graph of dependencies ( A => [B] means A depends/requires B to update its value before the value of A can be updated). I am using a Ruby hash and the TSort module for topological operations.

graph = {
  a: OpenStruct.new(dependencies: []),
  b: OpenStruct.new(dependencies: [:a]),
  c: OpenStruct.new(dependencies: [:b]),
  d: OpenStruct.new(dependencies: []),
  e: OpenStruct.new(dependencies: [:d]),
  f: OpenStruct.new(dependencies: []),
}
# Which gives 3 dependency trees
:c => :b => :a, :e => :d, :f 

Given any group of nodes, I need to find and sort topologically the nodes that need to be updated, including (recursively) the parents and children of this group of nodes. That is, all the dependency trees that include those nodes, and not just the dependencies up to those nodes in the trees.

Here are some examples (regarding the output, I can cope with either an array of TSorted graphs, or just a flattened version, I don't care, cf example 3) but the array of tsorted graphs is easier to spec as you'll see below.

desired_function(:a) == [:c,:b,:a] # or [[:c,:b,:a]]
desired_function(:b) == [:c,:b,:a] # or [[:c,:b,:a]]
desired_function(:c) == [:c,:b,:a] # or [[:c,:b,:a]]
desired_function(:c,:d) == [:c,:b,:a,:d,:e] # or [:d,:e,:c,:b,:a] or [[:d,:e], [:c,:b,:a]], or [[:c,:b,:a], [:d,:e]]
desired_function(:f) == [:f] # or [[:f]]

I have tried implementing a TSort algorithm, but I'm not sure how I can specify that I also need to include the parents of the node I am requesting the tsort from

My current implementation (on the graph given above)

def incomplete_function_to_get_tsorted_subgraphs(graph, node_names) 
  each_node = lambda { |&b| graph.slice(*node_names).each_key(&b) }
  each_child = lambda do |node_name, &b|
    graph[node_name].dependencies.each(&b)
  end
  TSort.tsort(each_node, each_child)
end

It is incomplete as currently it will only get dependencies up to the requested nodes incomplete_function_to_get_tsorted_subgraphs(graph, :b) # returns [:a, :b] instead of [:a,:b,:c] If I remove the slice then it just becomes a tsort on the full graph which isn't what I want (and I would need to prune it somehow)

The specs for what I want to achieve (assuming we are returning arrays of tsorted dependency graphs)

expect(complete_function_to_get_tsorted_subgraphs(:b)).to include([:a,:b,:c])

expect(complete_function_to_get_tsorted_subgraphs(:b,:d).to 
  include([:a,:b,:c], [:d, :e])

Am I going the right way with using the TSort algorithm, and can you help me fix my function, or do I have to go a totally different path to get what I want ?


Solution

  • What I ended up doing :

    From the stat I need to compute, always go up to the leaf nodes with the most dependencies (so if I need to compute :b, always assume you need find :c) with a function like this (the implementation can most likely be improved)

    def find_leaf_nodes(node_name)
      parents = all_nodes.select { |_,node| node.dependencies.include?(name) }
      if parents.any?
        parents.flat_map { |parent_name, _| find_leaf_nodes(parent_name) }
      else
        name
      end
    end
    

    And then just build the TSorted dependency list from here using the code I mentioned in the question