I have been trying to develop a graph structure that will link entities according to co-mentioned features between them, e.g. 2 places are linked if co-mentioned in an article.
I have managed to do so but I have been having problems to iteratively populate an edge with new information keeping the already existing one.
My approach (since I haven't found anything related anywhere) is to append existing information to a list, append the new link in the list and assign that list to the appropriate feature.
temp = []
if G.has_edge(i[z],i[j]):
temp.append(G[i[z]][i[j]]['article'])
temp.append(url[index])
G[i[z]][i[j]]['article'] = temp
else:
print "Create edge!"
G.add_edge(i[z],i[j], article=url)
del temp[:]
As you can see above, as there are many links to be populated, I defined a dedicated list (temp), loaded the old contents of a link's variable called article (if the link does not exist I create a link and add as first value the url that "brought" 2 places together.
My problem is that while I empty the list each time in order to be empty when a new pair comes in when I try to see a link's urls I get something like this:
{'article': [[...], u'http://www.huffingtonpost.co.uk/.../']
It seems like I am keeping only the last link as each time I delete the temporary list's contents but I cannot find a better way to do so without declaring an unnecessary bunch of temp lists.
Any ideas?
Thank you for your time.
TL/DR summary: change your entire snippet to
if G.has_edge(i[z],i[j]):
G[i[z]][i[j]]['article'].append(url[index])
else:
G.add_edge(i[z],i[j], article=[url])
Here's what's going on:
When you create the edge the first time you use
G.add_edge(i[z],i[j], article=url)
So it's a string. But later when you do
G[i[z]][i[j]]['article'] = temp
you've defined temp
to be a list whose first element is G[i[z]][i[j]]['article']
. So G[i[z]][i[j]]['article']
is now a list with two elements, the first of which is the old value for G[i[z]][i[j]]['article']
(a string) and the second of which is the new url (also a string).
Your problem comes at the later steps:
From then on, it's exactly the same thing. G[i[z]][i[j]]['article']
is again a list with two elements, the first of which is its old value (a list) and the second is the new url (a string). So you've got a nested list.
let's trace through with three urls: 'a'
, 'b'
, and 'c'
, and I'll use E
to abbreviate G[i[z]][i[j]]
. First time through, you get E='a'
. Second time through you get E=['a', 'b']
. Third time through it gives E=[['a','b'],'c']
. So it's always making E[0]
to be the former value of E
, and E[1]
to be the new url.
Two choices:
1) you can handle the creation of temp
differently if you've got a string or a list. This is the bad choice.
2)Better: Make it a list the whole time through and then don't even deal with temp
. Try creating the edge as (...,article = [url])
and then just use G[i[z]][i[j]]['article'].append(url)
instead of defining temp
.
So your code would be
if G.has_edge(i[z],i[j]):
G[i[z]][i[j]]['article'].append(url[index])
else:
G.add_edge(i[z],i[j], article=[url])
A separate thing that could also cause you problems is the call
del temp[:]
This should cause behavior different from what I think you're describing. So I think this is a bit different from how it's actually coded. When you set G[i[z]][i[j]] = temp
and then do del temp[:]
, you've made the two lists to be one list with two different names. When you del temp[:]
you're also doing it to G[i[z]][i[j]]
. Consider the following
temp = []
temp.append(1)
print temp
> [1]
L = temp
print L
> [1]
del temp[:]
print L
> []