I'm trying to model a versioning system of a tree structure. Here's the example:
Version 1 of the tree:
(a)<-[:BELONGS_TO]-(b)
(a)<-[:BELONGS_TO]-(c)
(d)<-[:BELONGS_TO]-(e)
Version 2 of the tree:
(a)<-[:BELONGS_TO]-(c)
(d)<-[:BELONGS_TO]-(b)
(d)<-[:BELONGS_TO]-(e)
So, in the version 2, I moved B into D. I actually now can achieve this by creating a version node and relate it to all the contained nodes, and give time property on each BELONGS_TO relationship. So the graph will look like this:
Version 1
MERGE (a:COMPONENT {name:'a'})
MERGE (b:COMPONENT {name:'b'})
MERGE (c:COMPONENT {name:'c'})
MERGE (d:COMPONENT {name:'d'})
MERGE (e:COMPONENT {name:'e'})
MERGE (version:VERSION {createdOn:'16-4-2015 12:17:00'})
MERGE (version)-[:CONTAINS]->(a)
MERGE (version)-[:CONTAINS]->(b)
MERGE (version)-[:CONTAINS]->(c)
MERGE (version)-[:CONTAINS]->(d)
MERGE (version)-[:CONTAINS]->(e)
MERGE (a)<-[:BELONGS_TO {createdOn:'16-4-2015 12:17:00'}]-(b)
MERGE (a)<-[:BELONGS_TO {createdOn:'16-4-2015 12:17:00'}]-(c)
MERGE (d)<-[:BELONGS_TO {createdOn:'16-4-2015 12:17:00'}]-(e)
RETURN *
Version 2
MERGE (a:COMPONENT {name:'a'})
MERGE (b:COMPONENT {name:'b'})
MERGE (c:COMPONENT {name:'c'})
MERGE (d:COMPONENT {name:'d'})
MERGE (e:COMPONENT {name:'e'})
MERGE (version:VERSION {createdOn:'28-5-2015 13:00:00'})
MERGE (version)-[:CONTAINS]->(a)
MERGE (version)-[:CONTAINS]->(b)
MERGE (version)-[:CONTAINS]->(c)
MERGE (version)-[:CONTAINS]->(d)
MERGE (version)-[:CONTAINS]->(e)
MERGE (a)<-[:BELONGS_TO {createdOn:'28-5-2015 13:00:00'}]-(c)
MERGE (d)<-[:BELONGS_TO {createdOn:'28-5-2015 13:00:00'}]-(b)
MERGE (d)<-[:BELONGS_TO {createdOn:'28-5-2015 13:00:00'}]-(e)
RETURN *
I import my tree structure using the MERGE clause because I want to reuse the nodes of the previous versions if those are the same nodes as the new version references. The reason for this is there are mappings to every node in the tree structure which I don't want to lose.
My query is like this:
MATCH (version:VERSION)-[:CONTAINS]->(component:COMPONENT)
WHERE version.createdOn = '28-5-2015 13:00:00'
OPTIONAL MATCH
(component)-[rels:BELONGS_TO*]->(componentParents)
WHERE ALL(rel IN rels WHERE rel.createdOn = '28-5-2015 13:00:00')
RETURN *
The thing is, my approach is working, but I am concerning about the performance once the version of the tree grows really big (let's say 10,000 versions although maybe it's not really a real-life scenario). In my understanding, it will need to compare many relationships with different 'createdOn' values to get the right path of the selected version. My question is, is my concern correct? Is it fine to model the tree structure graph like this? If it's not, how would you suggest of a better model?
Thank you in advance and any idea would be appreciated!
Versioning is a non exact science, especially in graphs and your use cases. There are many ways to do it depending of your needs.
Your model may have performance impact when the size of your graph will grow up. Filtering on relationships or node properties (not indexed for relationships) are very expensive operations. Generally relationships would drive you to the graph and add semantic meaning to it.
There is a common pattern that represent states of your nodes, for e.g. you will have a unique node representing your Component and multiple ComponentState nodes that will represent the state of the component over time.
In some Ascii-Art you would have :
(Component)-[:LAST_STATE]->(ComponentState {version:2})
-[:PREVIOUS_STATE]->(ComponentState {version: 1})
And in image :
These ComponentState nodes would then have relationships to the belongs_to Component Nodes.
I've created a little Neo4j console with your example :
http://console.neo4j.org/r/4trq7i
If you need to know how was the graph at Instant (see here version) 1, you just need to match the State components at this instant and traverse the BELONGS_TO relationships :
MATCH (n:ComponentState {version: 1})
MATCH (n)-[:BELONGS_TO]->(parent)
RETURN n.name, parent.name
Which would return the following :
n.name parent.name
b a
c a
e d
So for your usecase you can add times in place of versions here or use a TimeTree
You may also want to create the relationship between component states in place of component state to component if you need it.
Hope this will help you.