Search code examples
pythonpandasubuntumatrixnetworkx

Python and Networkx: Randomly Highlight Nodes by ignoring Patterns in CSV files


I have 4 CSV files, with slightly different 'X' matrix patterns each.
The 'X' patterns in each of the matrices vary from 1 to N-1 , with N marking the size of the matrix - (5,5) in this case:

pattern-1.csv        |   pattern-2.csv        |   pattern-3.csv        |   pattern-4.csv
                     |                        |                        |
1   2   3   4   5    |   1   2   3   X   X    |   1   2   3   4   5    |   X   2   3   4   5
X   7   8   9   10   |   6   7   8   9   X    |   6   7   X   9   10   |   6   X   8   9   10
X   12  13  14  15   |   13  12  13  14  15   |   7   12  X   14  15   |   7   12  13  14  15
16  17  X   19  20   |   16  17  18  19  20   |   16  17  X   19  20   |   16  17  14  19  20
21  22  X   24  25   |   21  22  23  24  25   |   21  22  X   24  25   |   21  22  23  24  25

I wish to generate (N,N) graphs that randomly highlight some nodes from min 1 to max N-1 (by highlight I mean a different colour for the nodes). And with each random run / iteration, the total sum of the labels of the highlighted nodes should be within a range 40 to 80 (for example, sum of 3 randomly highlighted nodes 12 + 15 + 20 = 47 , and so on...).

However, the highlighted nodes in the generated graphs SHOULD NOT match any of the 'X' patterns in the above 4 CSV files.
Basically, the code should take the 4 CSV files as a dataset and generate such graphs and also take the above mentioned into consideration... eventually there would be more CSV files for the dataset.

I have a simple Python code that does the opposite of what I wish to achieve with a single CSV file:

from tabulate import tabulate
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx

N = 5
G = nx.grid_2d_graph(N,N)

df = pd.read_csv('pattern-1.csv', index_col=None, header=None, dtype=str)

pos = dict( (n,n) for n in G.nodes() )

colors = [ "red" if df.loc[N - 1 - j, i] != "X" else "green" for i, j in G.nodes() ]

labels = dict( ((i,j), i + 1 + (N - 1 - j) * N) for i,j in G.nodes() )

print(tabulate(df, showindex=False, tablefmt="plain"))
#print("Sum of the nodes: " ???)

G.remove_edges_from(G.edges())
nx.draw_networkx(G, pos=pos, labels=labels, node_color=colors)
plt.axis('off')
plt.show()

Output of the above:

Figure-1

Any ideas on how I can achieve these goals?
Also, does it make sense to turn the individual patterns in the 4 CSV files into lists and put them into a single CSV file for simplicity?
If yes, then how would the code look like?

Thanks in advance!

P.S. This is an extension of my previous question here:
Python and Networkx: Retrieve Node Label Number and Compare it to a CSV Value
Thanks again to @AndrejKesely

EDIT:

Although I still seek help with the part of random highlighting of nodes that shouldn't match the 'X' patterns in the matrices... I was able to figure out how to use the total sum within a range:

import random
pick = random.sample(range(1,N*N), N-1)
r = range(40,80)
if sum(pick) in r:
  print("True: ", sum(pick))
else:
  print("False: ", sum(pick))

Solution

  • Here is one way to do it with Python standard library's random and itertools modules:

    import random
    from itertools import combinations
    
    import matplotlib.pyplot as plt
    import networkx as nx
    import pandas as pd
    
    df1 = pd.DataFrame(
        {
            1: [1, "X", "X", 16, 21],
            2: [2, 7, 12, 17, 22],
            3: [3, 8, 13, "X", "X"],
            4: [4, 9, 14, 19, 24],
            5: [5, 10, 15, 20, 25],
        }
    )
    
    df2 = pd.DataFrame(
        {
            1: [1, 6, 13, 16, 21],
            2: [2, 7, 12, 17, 22],
            3: [3, 8, 13, 18, 23],
            4: ["X", 9, 14, 19, 24],
            5: ["X", "X", 15, 20, 25],
        }
    )
    
    N = 5
    MIN_NUM_OF_HIGHLIGHTED_NODES = 1
    MIN_RANGE = 40
    MAX_RANGE = 80
    
    dfs = (df1, df2)
    
    x_labels = [
        i + 1
        for df in dfs
        for i, v in enumerate(item for row in df.to_numpy() for item in row)
        if v == "X"
    ]
    
    free_labels = [i for i in range(N * N + 1) if i not in x_labels]
    
    # Check that there is at least possible combination of nodes
    # that will match the constraints (otherwise, the while-loop
    # will never end)
    max_num_of_highlighted_nodes = 0
    for i in range(len(free_labels) + 1):
        for combination in combinations(free_labels, i):
            if MIN_RANGE <= sum(combination) <= MAX_RANGE:
                max_num_of_highlighted_nodes = N * N - len(x_labels)
                break
    
    # randomly highlight in green some nodes from min 1 to max N-1
    # total sum of the labels of the highlighted nodes should be within MIN and MAX range
    while True:
        if not max_num_of_highlighted_nodes:
            nodes_to_highlight = []
            break
        nodes_to_highlight = random.sample(
            range(MIN_NUM_OF_HIGHLIGHTED_NODES, max_num_of_highlighted_nodes + 1),
            k=random.randint(MIN_NUM_OF_HIGHLIGHTED_NODES, max_num_of_highlighted_nodes),
        )
        if (not set(nodes_to_highlight).intersection(x_labels)) and (
            MIN_RANGE <= sum(nodes_to_highlight) <= MAX_RANGE
        ):
            break
    
    G = nx.grid_2d_graph(N, N)
    G.remove_edges_from(G.edges())
    
    pos = {n: n for n in G.nodes()}
    labels = {(i, j): i + 1 + (N - 1 - j) * N for i, j in G.nodes()}
    
    # green nodes in the generated graph SHOULD NOT match any of the 'X' red nodes
    colors = [
        "green" if label in nodes_to_highlight else "white" for label in labels.values()
    ]
    colors = [
        "red" if label in x_labels else color
        for color, label in zip(colors, labels.values())
    ]
    
    nx.draw_networkx(G, pos=pos, labels=labels, node_color=colors)
    plt.axis("off")
    plt.show()
    

    Each time the above cell is run in a Jupyter notebook, you will get a random output matching the given constraints:


    enter image description here


    enter image description here


    enter image description here


    And so on.