Search code examples
pythonserializationpicklenetworkxgml

Python Networkx: preserving nodes ID when saving/serializing networks to hardisk


I am using networkx1.10 in Python for a project that requires to dump the networks on the hardisk and later reload them in order to continue a the execution of some algorithm that manipulates such networks.

I've been trying doing that in a couple of different ways: using nx.write_gml() and nx_read_gml() at first, now with pickle/cpickle.

Although superficially everything seems working fine, I've noticed that I get different results in my simulation whether the saving/loading happens or not at some point in time.

I think this might be related to the fact that some networks seem to be modified by the dumping/reloading procedure (which is of course unexpected).

For debugging, now I am saving and reloading each network with pickle, comparing their gml representation (achieved with nx.write_gml / nx.generate_gml) before and after dumping/reloading. Doing so, I've noticed some discrepancies.

In some cases it's just the order of some of the graph attributes that gets modified, with causes no harm in my program. In other cases, the order in which two edges appear in the gml representation is different, again no harm.

Often, however, the ID of some nodes is modified, as in this example:

https://www.diffchecker.com/zvzxrshy

Although the edges seem to be modified accordingly so that the network appears to be equivalent, this change in the ID can alter my simulation (for reasons that I am not going to explain).

I believe this might be the root of my problems.

Do you know why this is happening, even when using a low level serialization mechanism such as those implemented by pickle/cpickle?

How can I ensure that the networks are exactly the same before and after the serialization procedure?

I am doing something like this:

with open('my_network.pickle', 'wb') as handle:
    pickle.dump(my_network, handle, protocol=pickle.HIGHEST_PROTOCOL)
...
with open('my_network.pickle', 'rb') as handle:
    my_network_reloaded = pickle.load(handle)

# compare nx.write_gml for my_network and my_network_reloaded

Any help will be highly appreciated: I've been fighting with this issue for the last three days and I am going crazy!

Thank you


Solution

  • They seem to be the same graphs, up to isomorphism.

    Your issue may be node order. Indeed, networkx uses dicts to store nodes and edges, which are unordered and randomized.

    You have two solutions: either ignore that, or use ordered graphs:

    >>> class OrderedGraph(nx.Graph):
    ...    node_dict_factory = OrderedDict
    ...    adjlist_dict_factory = OrderedDict
    >>> G = OrderedGraph()
    >>> G.add_nodes_from( (2,1) )
    >>> G.nodes()
    [2, 1]
    >>> G.add_edges_from( ((2,2), (2,1), (1,1)) )
    >>> G.edges()
    [(2, 2), (2, 1), (1, 1)]
    

    (example from the documentation of the Graph class)