After thinking about the most efficient way to store unique graphs (in plain text), I determined the following procedure:
n
, load the graphs inside the folder of graph size n
, and check if any of them is isomorphic (with the to be stored/new graph).However that has a (*checking) complexity of O(m) per graph, and I thought, perhaps it is possible to compute some kind of hash of each graph and store the graph with that hash as that filename, such that the new graph can compute its own hash, and directly see whether the hash/graph already exists (in some form), or not, by simply reading the filenames (instead of the file-content). This would have a (*checking) complexity of O(1) (actually O(0) according to the clarification below) instead of O(n) per graph.
n
.I assume the graphs need to be stored in plain text. I assume checking (by means of loading a plain text graph) is quite expensive.
Is it possible to compute some hash of an undirected networkx graph x
in Python that allows another graph y
to determine whether it is isomorphic to that graph or not by comparison of its own hash to that hash of graph x
? If yes, how would one do that?
The answer seems to be: yes.
One can use the Weisfeiler-Lehman hashing algorithm to do this:
https://networkx.org/documentation/stable/reference/algorithms/graph_hashing.html
import networkx as nx
G2=nx.Graph()
G2.add_edges_from(
[
(5, 6),
(6, 7),
(7, 5),
(7, 8),
]
)
print(G2.__dict__)
some_hash:str = nx.algorithms.graph_hashing.weisfeiler_lehman_graph_hash(G2)
print(f'some_hash={some_hash}')
As pointed out in the comments by Paul Brodersen, (or at least my interpretation of the comment), hash collisions may still occur, so if a graph has the same isomorphic hash, it would be worth verifying that the two graphs are indeed the same. This would make the solution O(1) instead of O(0), which is still better than O(n). *For my weird definition of what constitutes 0
, 1
and n
complexity operations see the clarification header in the original question.