python networkx graph-theory isomorphism

Storing only unique/non-isomorphic undirected networkx graphs?

Context

After thinking about the most efficient way to store unique graphs (in plain text), I determined the following procedure:

create a folder with the graph size (nr of nodes) as name
when storing a new graph of size n, load the graphs inside the folder of graph size n, and check if any of them is isomorphic (with the to be stored/new graph).

Issue

However that has a (*checking) complexity of O(m) per graph, and I thought, perhaps it is possible to compute some kind of hash of each graph and store the graph with that hash as that filename, such that the new graph can compute its own hash, and directly see whether the hash/graph already exists (in some form), or not, by simply reading the filenames (instead of the file-content). This would have a (*checking) complexity of O(1) (actually O(0) according to the clarification below) instead of O(n) per graph.

Clarification *Checking

With *checking, I mean the amount of times a graph is loaded from file (to compute some property of it).
O(m) where m is the number of non-isomorphic graphs exist of size n.

Assumptions

I assume the graphs need to be stored in plain text. I assume checking (by means of loading a plain text graph) is quite expensive.

Question

Is it possible to compute some hash of an undirected networkx graph x in Python that allows another graph y to determine whether it is isomorphic to that graph or not by comparison of its own hash to that hash of graph x? If yes, how would one do that?

Solution

The answer seems to be: yes.

One can use the Weisfeiler-Lehman hashing algorithm to do this:

https://networkx.org/documentation/stable/reference/algorithms/graph_hashing.html

Example:

    import networkx as nx    
    G2=nx.Graph()
    G2.add_edges_from(
        [
            (5, 6),
            (6, 7),
            (7, 5),
            (7, 8),
        ]
    )
    print(G2.__dict__)
    some_hash:str = nx.algorithms.graph_hashing.weisfeiler_lehman_graph_hash(G2)
    print(f'some_hash={some_hash}')

Hash Collisions

As pointed out in the comments by Paul Brodersen, (or at least my interpretation of the comment), hash collisions may still occur, so if a graph has the same isomorphic hash, it would be worth verifying that the two graphs are indeed the same. This would make the solution O(1) instead of O(0), which is still better than O(n). *For my weird definition of what constitutes 0, 1 and n complexity operations see the clarification header in the original question.