My code adds nods and creates relations. It updates when the connection is a->b, a->c, a->d, it works(to node a new relations are added), but when I add connection f->a then a second node with name a is created. How can I make it to update the existing node a?
graph = Graph()
with open('test2') as fp:
for line in fp:
result = line.split('\t')
category1 = graph.merge_one("Category", "name",result[0][result[0].rfind(':')+1:])
category2 = graph.merge_one("Category", "name",result[1][result[1].rfind(':')+1:])
print result[0][result[0].rfind(':')+1:] +"|"+result[1][result[1].rfind(':')+1:]
graph.create_unique(Path(category1, "SubCategoryOf", category2))
My test files is:
Category:Wars_involving_Burma Category:Wars_by_country Category:Wars_involving_Burma Category:Military_history_of_Burma Category:Wars_involving_Burma Category:Foreign_relations_of_Burma Category:World_War_II Category:Wars_involving_Bulgaria Category:World_War_II Category:Wars_involving_Burma
In this example
Category:Wars_involving_Burma
is created twice.
When I run your example, I don't get a node twice. From your question I can't tell how many 'Category:...' you have in each line. From how you split the line I assumed it's always two. One possible issue is that you don't remove line endings, so one of your 'Category:Wars_involving_Burma' nodes might have a newline character at the end. Also what you pasted is space
separated, not \t
.
Here is a suggestion how to improve your code, assuming your file looks like http://paste.ubuntu.com/10874106/
graph = Graph()
with open('test2') as fp:
for line in fp:
# strip the line ending first, then split by whitespace
# I assume every line has to category entries?
result = line.rstrip().split()
# getting the category name is easier and more readable like this
category1 = graph.merge_one("Category", "name", result[0].split(':')[1])
category2 = graph.merge_one("Category", "name", result[1].split(':')[1])
print result[0].split(':')[1] + '\t' + result[1].split(':')[1]
# you don't need a Path here
graph.create_unique((category1, "SubCategoryOf", category2))
In addition, if you want your "Category" nodes to be unique, you should have a uniqueness constraint on the "name" property of "Category" nodes.
Cypher:
CREATE CONSTRAINT ON (n:Category) ASSERT n.name IS UNIQUE
py2neo:
graph.schema.create_uniqueness_constraint('Category', 'name')