I have 4 CSV files, with slightly different 'X' matrix patterns each.
The 'X' patterns in each of the matrices vary from 1 to N-1 , with N marking the size of the matrix - (5,5) in this case:
pattern-1.csv | pattern-2.csv | pattern-3.csv | pattern-4.csv
| | |
1 2 3 4 5 | 1 2 3 X X | 1 2 3 4 5 | X 2 3 4 5
X 7 8 9 10 | 6 7 8 9 X | 6 7 X 9 10 | 6 X 8 9 10
X 12 13 14 15 | 13 12 13 14 15 | 7 12 X 14 15 | 7 12 13 14 15
16 17 X 19 20 | 16 17 18 19 20 | 16 17 X 19 20 | 16 17 14 19 20
21 22 X 24 25 | 21 22 23 24 25 | 21 22 X 24 25 | 21 22 23 24 25
I wish to generate (N,N) graphs that randomly highlight some nodes from min 1 to max N-1 (by highlight I mean a different colour for the nodes). And with each random run / iteration, the total sum of the labels of the highlighted nodes should be within a range 40 to 80 (for example, sum of 3 randomly highlighted nodes 12 + 15 + 20 = 47 , and so on...).
However, the highlighted nodes in the generated graphs SHOULD NOT match any of the 'X' patterns in the above 4 CSV files.
Basically, the code should take the 4 CSV files as a dataset and generate such graphs and also take the above mentioned into consideration... eventually there would be more CSV files for the dataset.
I have a simple Python code that does the opposite of what I wish to achieve with a single CSV file:
from tabulate import tabulate
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx
N = 5
G = nx.grid_2d_graph(N,N)
df = pd.read_csv('pattern-1.csv', index_col=None, header=None, dtype=str)
pos = dict( (n,n) for n in G.nodes() )
colors = [ "red" if df.loc[N - 1 - j, i] != "X" else "green" for i, j in G.nodes() ]
labels = dict( ((i,j), i + 1 + (N - 1 - j) * N) for i,j in G.nodes() )
print(tabulate(df, showindex=False, tablefmt="plain"))
#print("Sum of the nodes: " ???)
G.remove_edges_from(G.edges())
nx.draw_networkx(G, pos=pos, labels=labels, node_color=colors)
plt.axis('off')
plt.show()
Output of the above:
Any ideas on how I can achieve these goals?
Also, does it make sense to turn the individual patterns in the 4 CSV files into lists and put them into a single CSV file for simplicity?
If yes, then how would the code look like?
Thanks in advance!
P.S. This is an extension of my previous question here:
Python and Networkx: Retrieve Node Label Number and Compare it to a CSV Value
Thanks again to @AndrejKesely
EDIT:
Although I still seek help with the part of random highlighting of nodes that shouldn't match the 'X' patterns in the matrices... I was able to figure out how to use the total sum within a range:
import random
pick = random.sample(range(1,N*N), N-1)
r = range(40,80)
if sum(pick) in r:
print("True: ", sum(pick))
else:
print("False: ", sum(pick))
Here is one way to do it with Python standard library's random and itertools modules:
import random
from itertools import combinations
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd
df1 = pd.DataFrame(
{
1: [1, "X", "X", 16, 21],
2: [2, 7, 12, 17, 22],
3: [3, 8, 13, "X", "X"],
4: [4, 9, 14, 19, 24],
5: [5, 10, 15, 20, 25],
}
)
df2 = pd.DataFrame(
{
1: [1, 6, 13, 16, 21],
2: [2, 7, 12, 17, 22],
3: [3, 8, 13, 18, 23],
4: ["X", 9, 14, 19, 24],
5: ["X", "X", 15, 20, 25],
}
)
N = 5
MIN_NUM_OF_HIGHLIGHTED_NODES = 1
MIN_RANGE = 40
MAX_RANGE = 80
dfs = (df1, df2)
x_labels = [
i + 1
for df in dfs
for i, v in enumerate(item for row in df.to_numpy() for item in row)
if v == "X"
]
free_labels = [i for i in range(N * N + 1) if i not in x_labels]
# Check that there is at least possible combination of nodes
# that will match the constraints (otherwise, the while-loop
# will never end)
max_num_of_highlighted_nodes = 0
for i in range(len(free_labels) + 1):
for combination in combinations(free_labels, i):
if MIN_RANGE <= sum(combination) <= MAX_RANGE:
max_num_of_highlighted_nodes = N * N - len(x_labels)
break
# randomly highlight in green some nodes from min 1 to max N-1
# total sum of the labels of the highlighted nodes should be within MIN and MAX range
while True:
if not max_num_of_highlighted_nodes:
nodes_to_highlight = []
break
nodes_to_highlight = random.sample(
range(MIN_NUM_OF_HIGHLIGHTED_NODES, max_num_of_highlighted_nodes + 1),
k=random.randint(MIN_NUM_OF_HIGHLIGHTED_NODES, max_num_of_highlighted_nodes),
)
if (not set(nodes_to_highlight).intersection(x_labels)) and (
MIN_RANGE <= sum(nodes_to_highlight) <= MAX_RANGE
):
break
G = nx.grid_2d_graph(N, N)
G.remove_edges_from(G.edges())
pos = {n: n for n in G.nodes()}
labels = {(i, j): i + 1 + (N - 1 - j) * N for i, j in G.nodes()}
# green nodes in the generated graph SHOULD NOT match any of the 'X' red nodes
colors = [
"green" if label in nodes_to_highlight else "white" for label in labels.values()
]
colors = [
"red" if label in x_labels else color
for color, label in zip(colors, labels.values())
]
nx.draw_networkx(G, pos=pos, labels=labels, node_color=colors)
plt.axis("off")
plt.show()
Each time the above cell is run in a Jupyter notebook, you will get a random output matching the given constraints:
And so on.