python networkx social-networking directed-graph network-analysis

AttributeError: 'NoneType' object has no attribute 'get' when using greedy_modularity_communities from NetworkX

I keep getting the above error when trying to run the greedy_modularity_communities community-finding algorithm from NetworkX on a network of 123212 nodes and 329512 edges.

simpledatasetNX here is a NetworkX Graph object. Here is what I most recently ran:

greedy_modularity_communities(simpledatasetNX)

and what has been output:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-a3b0c8705138> in <module>()
----> 1 greedy_modularity_communities(simpledatasetNX)

2 frames
/usr/local/lib/python3.7/dist-packages/networkx/algorithms/community/modularity_max.py in greedy_modularity_communities(G, weight, resolution)
 98             if j != i
 99         }
--> 100         for i in range(N)
101     }
102     dq_heap = [

/usr/local/lib/python3.7/dist-packages/networkx/algorithms/community/modularity_max.py in <dictcomp>(.0)
 98             if j != i
 99         }
--> 100         for i in range(N)
101     }
102     dq_heap = [

/usr/local/lib/python3.7/dist-packages/networkx/algorithms/community/modularity_max.py in <dictcomp>(.0)
 96             - 2 * resolution * k[i] * k[j] * q0 * q0
 97             for j in [node_for_label[u] for u in G.neighbors(label_for_node[i])]
---> 98             if j != i
 99         }
100         for i in range(N)

AttributeError: 'NoneType' object has no attribute 'get'

I've been running into this exact error several times after multiple attempts to remedy this. Here's where I started, and the things I did to fix it:

First, I had constructed a NetworkX MultiDiGraph object from a data set, and I needed to turn this into a Graph object in order to run this algorithm. I looked at simply doing

desired_graph_object = nx.Graph(multidigraphobject)

and it seemed to do what I wanted it to do on a much simpler test graph I made, so I did this and then tried to run the algorithm, getting the above error.

I was confused about what the issue could be. Then I remembered that the way I constructed my multidigraph object involved a lot of attributes on the edges (there were closer to 500,000 edges on this original object) so that when I tried to simplify it down to a graph object, something wonky probably happened when more than one edge between the same two nodes was combined as I hadn't done anything to account for this. Some of these attributes were ints and some were strings, for example. This seemed like a simple enough fix: all I had to do was construct a new graph object from my dataset and keep only a single attribute for analysis related to the greedy_modularity_communities algorithm. So I did that and ended up with the exact same error.

I now suspect that there is something wrong with the way I constructed both of these graphs as the nature of the error is the same whether I run the algorithm on the simplified multidigraph object or on the graph object.

Here is my code for constructing the graph object; the code for constructing the multidigraph object was essentially copy-pasted with minor adjustments to make this, so I don't feel the need to include it. As far as I can tell, both the graph object this code constructs and the multidigraph object discussed previously look how I intended them to.

%%time

for index, row in df.iterrows(): # *Go through our dataframe row by row*
  if row["link_id"] != row["parent_id"]: # *Check that the current row is a response to someone else*
    link_id_df = link_id_dataframe_dict[row["link_id"]] # *Get the desired thread's dataframe*
    for index2, row2 in link_id_df.iterrows(): # *Iterate through the thread dataframe's rows (comments)*
      if (row2["id"] in row["parent_id"]) and ( (row["author"],row2["author"]) not in nx.edges(G) ): # *Go until we find the comment whose id matches our original comment's parent_id, AND check that our current potential edge isn't already an edge*
        G.add_edge(row["author"],row2["author"]) # *Add the desired edge.*
        if row["subreddit"] == ("Daddit" or "daddit"): # *This line and the next three, add the necessary edge attributes.*
          nx.set_edge_attributes(G,{(row["author"],row2["author"]): {"daddit": 1, "mommit": 0}})
        else:
          nx.set_edge_attributes(G,{(row["author"],row2["author"]): {"daddit": 0, "mommit": 1}})
      elif (row2["id"] in row["parent_id"]) and ( (row["author"],row2["author"]) in nx.edges(G) ): # *If the edge already exists, ie these two users have interacted before, increase appropriate comment quantity*
        if row["subreddit"] == ("Daddit" or "daddit"):
          G[row["author"]][row2["author"]]["daddit"] += 1
        else:
          G[row["author"]][row2["author"]]["mommit"] += 1

Some additional context: my original dataset is a massive data frame that I wanted to construct my network from. Each row represents a comment or post on a social media site. It involves linking id's of comments to the parent_id's of comments that reply to the first comment. The link_id_dataframe_dict is a dictionary where a key is a given thread and the object associated with that key is a subdataframe of all the comments in that thread (ie, with that link_id).

The idea is that we go through our entire data frame row by row, identify the thread/link_id that this row/comment is part of, then we search through the associated link_id data frame for the other row/comment that the row/comment we are considering is a response to. When we do so, we add an edge between two nodes, where this edge represents the comment, and the two nodes are the users who posted the reply and the comment being replied to. We also make a note of which community this comment reply took place in by adding attribute 1 labeled with that community, and a zero for the other community as a way of keeping track of where these users are interacting. For this version of the code, if these users have interacted before, we note that as well by adding one to the attribute representing the community in which the new interaction has taken place.

UPDATES:

I removed the self-loops from the graph yet still run into the same error unfortunately.

Solution

It looks like you've encountered a known bug which has been corrected. More details are here:

https://github.com/networkx/networkx/pull/4996

I don't think it's yet in the most recent released version of networkx (it looks like it will appear in version 2.7), but if you replace the algorithm you're using with the code here: https://github.com/networkx/networkx/blob/main/networkx/algorithms/community/modularity_max.py it should fix this.