Search code examples
pythonalgorithmchartsnetworkx

Divide graph into subgraphs based on attribute of border vertices


Finding Clusters in Directed Graphs with Specific Border Node Conditions

I'm working with a directed graph (DiGraph) in Python, using the NetworkX library. My graph consists of several nodes representing different positions (labelled "WITH" and "AGAINST") on various topics or categories (numbered 1 through 6). Some of these nodes have a special attribute (special_attribute=True), which plays a crucial role in my analysis.

Objective

I need to develop an algorithm to identify clusters within this graph under the following conditions:

  • A cluster is a group of nodes that are strongly connected.
  • The border or peripheral nodes of these clusters must have the special_attribute=True attribute. It's important to note that these border nodes can be part of multiple clusters.

Example Graph Structure

The graph has nodes like 1-WITH, 1-AGAINST, 2-WITH, etc., with some having the special_attribute attribute set. A small example graph is shown below, with the red nodes having the special attribute example graph

Question

How can I implement an algorithm in Python that finds all such clusters in the graph, adhering to the condition that the clusters' border nodes have the special_attribute=True attribute? In the example above, the clusters would be

4-AGAINST, 4-WITH, 6-WITH, 6-AGAINST, 5-WITH, 5-AGAINST, 2-WITH, 2-AGAINST

and

5-WITH, 5-AGAINST, 2-WITH, 2-AGAINST, 3-WITH,3-AGAINST, 1-WITH,1-AGAINST

Note: The actual graph I am working on has around 40.000 vertices and 70.000 edges.

Code to copy-paste for creating the graph above:

digraph = nx.DiGraph()
digraph.add_node('1-WITH')
digraph.add_node('1-AGAINST')
digraph.add_node('2-WITH', specialAttribute=True)
digraph.add_node('2-AGAINST', specialAttribute=True)
digraph.add_node('3-WITH')
digraph.add_node('3-AGAINST')
digraph.add_node('4-WITH')
digraph.add_node('4-AGAINST')
digraph.add_node('5-WITH', specialAttribute=True)
digraph.add_node('5-AGAINST', specialAttribute=True)
digraph.add_node('6-WITH')
digraph.add_node('6-AGAINST')


# Add edges 
digraph.add_edges_from([('1-AGAINST', '1-WITH'),('1-WITH', '2-WITH'),('1-WITH', '3-WITH'), ('2-WITH','4-AGAINST'), ('4-AGAINST', '5-AGAINST'),  ('4-AGAINST', '6-AGAINST'),  ('6-AGAINST', '6-WITH'),
                        ('6-WITH', '5-AGAINST'),  ('6-WITH', '4-WITH'),  ('5-AGAINST', '3-AGAINST'), ('3-AGAINST', '2-WITH'), ('3-AGAINST', '1-AGAINST'), ('3-WITH', '5-WITH'), ('4-WITH','2-AGAINST'),
                        ('2-AGAINST','1-AGAINST'), ('2-AGAINST','3-WITH'), ('5-WITH','4-WITH'), ('5-WITH','6-AGAINST')
])```

Solution

  • This: " the graph would be partitioned into unconnected subgraphs with one cluster per subgraph?". The end goal is to perform a demanding optimization problem on the graph, where the nodes with the specialAttribute can be seen as observations, and this logic would then make each group independent, this reducing computational issues.

    1. Induce a subgraph that does not contain your "special" nodes.
    2. Determine the unconnected components.
    non_special_nodes = [node for node in digraph if "specialAttribute" not in digraph.nodes[node]]  
    subgraph = nx.subgraph(digraph, non_special_nodes)
    components = list(nx.connected_components(subgraph.to_undirected()))
    print(components)
    # [{'1-AGAINST', '1-WITH', '3-AGAINST', '3-WITH'},
    #  {'4-AGAINST', '4-WITH', '6-AGAINST', '6-WITH'}]