I'm working with a directed graph (DiGraph) in Python, using the NetworkX library. My graph consists of several nodes representing different positions (labelled "WITH" and "AGAINST") on various topics or categories (numbered 1 through 6). Some of these nodes have a special attribute (special_attribute=True
), which plays a crucial role in my analysis.
I need to develop an algorithm to identify clusters within this graph under the following conditions:
special_attribute=True
attribute. It's important to note that these border nodes can be part of multiple clusters.The graph has nodes like 1-WITH
, 1-AGAINST
, 2-WITH
, etc., with some having the special_attribute
attribute set. A small example graph is shown below, with the red nodes having the special attribute
How can I implement an algorithm in Python that finds all such clusters in the graph, adhering to the condition that the clusters' border nodes have the special_attribute=True attribute? In the example above, the clusters would be
4-AGAINST, 4-WITH, 6-WITH, 6-AGAINST, 5-WITH, 5-AGAINST, 2-WITH, 2-AGAINST
and
5-WITH, 5-AGAINST, 2-WITH, 2-AGAINST, 3-WITH,3-AGAINST, 1-WITH,1-AGAINST
Note: The actual graph I am working on has around 40.000 vertices and 70.000 edges.
Code to copy-paste for creating the graph above:
digraph = nx.DiGraph()
digraph.add_node('1-WITH')
digraph.add_node('1-AGAINST')
digraph.add_node('2-WITH', specialAttribute=True)
digraph.add_node('2-AGAINST', specialAttribute=True)
digraph.add_node('3-WITH')
digraph.add_node('3-AGAINST')
digraph.add_node('4-WITH')
digraph.add_node('4-AGAINST')
digraph.add_node('5-WITH', specialAttribute=True)
digraph.add_node('5-AGAINST', specialAttribute=True)
digraph.add_node('6-WITH')
digraph.add_node('6-AGAINST')
# Add edges
digraph.add_edges_from([('1-AGAINST', '1-WITH'),('1-WITH', '2-WITH'),('1-WITH', '3-WITH'), ('2-WITH','4-AGAINST'), ('4-AGAINST', '5-AGAINST'), ('4-AGAINST', '6-AGAINST'), ('6-AGAINST', '6-WITH'),
('6-WITH', '5-AGAINST'), ('6-WITH', '4-WITH'), ('5-AGAINST', '3-AGAINST'), ('3-AGAINST', '2-WITH'), ('3-AGAINST', '1-AGAINST'), ('3-WITH', '5-WITH'), ('4-WITH','2-AGAINST'),
('2-AGAINST','1-AGAINST'), ('2-AGAINST','3-WITH'), ('5-WITH','4-WITH'), ('5-WITH','6-AGAINST')
])```
This: " the graph would be partitioned into unconnected subgraphs with one cluster per subgraph?". The end goal is to perform a demanding optimization problem on the graph, where the nodes with the specialAttribute can be seen as observations, and this logic would then make each group independent, this reducing computational issues.
non_special_nodes = [node for node in digraph if "specialAttribute" not in digraph.nodes[node]]
subgraph = nx.subgraph(digraph, non_special_nodes)
components = list(nx.connected_components(subgraph.to_undirected()))
print(components)
# [{'1-AGAINST', '1-WITH', '3-AGAINST', '3-WITH'},
# {'4-AGAINST', '4-WITH', '6-AGAINST', '6-WITH'}]