Search code examples
pythonnetworkxgml-geographic-markup-lan

Getting a node after write/read from gml NetworkX


I'm having a problem while getting nodes after reading a GML graph. (I'm sorry I can't provide you with a exact reproducable code, because my code is 500 lines long, and smaller reproducable examples are giving weirdly correct results). So I'll try to describe it as well as I can:

I've created a moderately large graph G (40k nodes, 1 mln edges). I can access its nodes by its String labels by simply executing G['something']. I've written it into the GML file, and then read it. Now: I can't access the nodes by its labels like before (I'm getting KeyError), but I can access them by ids (that were created during writing the GML file, Am I right?) i.e. G[1] gives me an AtlasView:

AtlasView({0: {'weight': 1}, 3253: {'weight': 8}, 9694: {'weight': 1}....

But 0, 3253, 9694 are also ids, not labels. Do you know what went wrong?

Here's my write and read code:

G = nx.Graph()
for mp in mps:
    G.add_node(mp.name, bipartite=0)
    for word in mp.speeches:
        G.add_node(word, bipartite=1)
        if not G.has_edge(mp.name, word):
            G.add_edge(mp.name, word, weight = 1)
        else:
            G[mp.name][word]['weight'] += 1
#Here I can simply acces the node by G[mp.name]
# and the output is i.e. {'wznawiać': {'weight': 2}, 'obrady':....
nx.write_gml(G, "test.gml")

G = nx.read_gml('test.gml')
#Here I can't acces the node by G[mp.name], but only by it's id

Also, when I'm trying to reproduce problem on a smaller example I'm getting correct results. Maybe it's something with encoding?


Solution

  • This is rather a workaroud than a solution but works when you can generate the file again (if you have a solution I'll be happy to see it since I've spend a day on it):

    TL;DR: If you can generate the graph once again, do it and save it in other format.

    What I've learned: Somehow, in my case, when reading (the file itself is ok, I've checked it manually in text editor) larger NetworkX Graph form the .gml file, the graph becomes corrupted - Ids (generated automatically for file) and Labels (by which the nodes can be accessed) are shifted. It looks like this (this code will work. The problem appears only when analyzing larger data graphs):

    #prepare the data:
    G = nx.Graph()
    G.add_node("String1")
    G.add_node("String2")
    G.add_edge("String1", "String2", weight = 1)
    nx.write_graphml(G, "test.graphml")
    nx.write_gml(G, "test.gml")
    
    #now reading:
    gml = nx.read_gml('test.gml')
    graphml = nx.read_graphml('test.graphml')
    
    #let's sort the edges by weight just to make this example clearer:
    seGml = sorted(gml.edges(data=True),key= lambda x: x[2]['weight'],reverse=True)
    seGraph = sorted(graphml.edges(data=True),key= lambda x: x[2]['weight'],reverse=True)
    print(seGml[0])
    print(seGraph[0])
    

    gives output:

    (0,1, {'weight': 1})
    ('String1', 'String2', {'weight': 1})
    

    In the gml case it's impossible to get the nodes by G["String1"] (gives KeyError) and getting all the attributes into the dictionary sometimes gives the ability to reach the node label i.e.: dictOfAtts[0] gives 'String1', but sometimes it also gives the Key error

    How to workaroud it: If you can generate the graph once again, do it, and write it in other format (.graphml worked for me). BUT you can't just read the .gml, then write it to .graphml and once again read the .graphml - it's still corrupted.