I am attempting to setup a new graph database to contain records of products and their relationship on each other's versioned components. For each product it can have many components, and each component is made up of multiple versions. Each version can be dependent on none or many versions of any other components. I want to be able to query this database to pick any version of a component and determine what other versioned components it is depended on, or what depends on it.
The data structure I have attempted in my examples is not defined yet, so if a completely different structure is more suitable I'm open to changing it. I originally considered setting the DEPENDS_ON
relationship directly between members. However, as new members will be added over time if a new member is added and falls within the version_min and version_max range of an existing records dependancy range, I would then need to go back and identify all affected records and update all of them, which doesn't feel like it would scale over time. This is what lead to the idea of having a member being dependent on a component, with the version limits defined in the relationship parameters.
I have put together a very simple example of 3 products (sample data at the end), with a single type of component and 1 version of each in all cases except one. I've then added only two dependencies into this example, 'a' depends on a range of 'b' versions, and one of the 'b' versions depends on a version of 'c'.
I would like to be able to perform a query to say "give me all downstream members which member prod_a_comp_1_v_1
depends on". Similarly I would like to do this in reverse too, which I imagine is achieved by just reversing some of the relationship parameters.
So far I've achieved this for a single hop (list b versions which a depends on), shown here:
MATCH
p=(a:member{name:'prod_a_comp_1_v_1'})-[d:DEPENDS_ON]->(c:component)<-[v:VERSION_OF]-(b:member) WHERE b.version >= d.version_min AND b.version <= d.version_max
RETURN p
But I don't know how to get it to recursively perform this query on the results of this first match. I investigated variable length/depths, but because there is a conditional parameter in the relationship in the variable depth (DEPENDS_ON), I could not get this to work.
From the example data if querying all downstream dependencies of prod_a_comp_1_v_1
it should return: [prod_b_comp_1_v_2
, prod_b_comp_1_v_3
, prod_c_comp_1_v_1
].
e.g. this figure:
Currently my thought is to use the above query and perform the repeated call on the database based on the results from the client end (capturing circular loops etc.), but that seems very undesirable.
Sample data:
CREATE
(prod_a:product {name:'prod_a'}),
(prod_a_comp_1:component {name: 'prod_a_comp_1', type:'comp_1'}),
(prod_a_comp_1)-[:COMPONENT_OF {type:'comp_1'}]->(prod_a),
(prod_a_comp_1_v_1:member {name:'prod_a_comp_1_v_1', type:'comp_1', version:1}),
(prod_a_comp_1_v_1)-[:VERSION_OF {version:1}]->(prod_a_comp_1)
CREATE
(prod_b:product {name:'prod_b'}),
(prod_b_comp_1:component {name: 'prod_b_comp_1', type:'comp_1'}),
(prod_b_comp_1)-[:COMPONENT_OF {type:'comp_1'}]->(prod_b),
(prod_b_comp_1_v_1:member {name:'prod_b_comp_1_v_1', type:'comp_1', version:1}),
(prod_b_comp_1_v_2:member {name:'prod_b_comp_1_v_2', type:'comp_1', version:2}),
(prod_b_comp_1_v_3:member {name:'prod_b_comp_1_v_3', type:'comp_1', version:3}),
(prod_b_comp_1_v_1)-[:VERSION_OF {version:1}]->(prod_b_comp_1),
(prod_b_comp_1_v_2)-[:VERSION_OF {version:2}]->(prod_b_comp_1),
(prod_b_comp_1_v_3)-[:VERSION_OF {version:3}]->(prod_b_comp_1)
CREATE
(prod_c:product {name:'prod_c'}),
(prod_c_comp_1:component {name: 'prod_c_comp_1', type:'comp_1'}),
(prod_c_comp_1)-[:COMPONENT_OF {type:'comp_1'}]->(prod_c),
(prod_c_comp_1_v_1:member {name:'prod_c_comp_1_v_1', type:'comp_1', version:1}),
(prod_c_comp_1_v_1)-[:VERSION_OF {version:1}]->(prod_c_comp_1)
CREATE
(prod_a_comp_1_v_1)-[:DEPENDS_ON {version_min:2, version_max:3}]->(prod_b_comp_1),
(prod_b_comp_1_v_3)-[:DEPENDS_ON {version_min:1, version_max:100}]->(prod_c_comp_1)
First, let me first adjust your node labels to use camel casing to follow the suggested neo4j naming convention. I will also rename member
to the much more intuitive Version
label.
You can change your data model so that the DEPENDS_ON
relationship type directly connects Version
nodes. Although this can result in many more relationships, it will also allow you to skip version numbers within a min/max range (e.g., if some version numbers in the range are unreliable/expensive/the wrong color/etc.). But the main reason to do that is it will make your use case extremely simple.
Here is how to create an updated version of your sample data, with my suggested adjustments (and with less redundant info):
CREATE
(a:Product {id: 'a'}),
(a_1:Component {id: 'a_1', type: 'comp_1'})-[:COMPONENT_OF]->(a),
(a_1_1:Version {id: 'a_1_1', version: 1})-[:VERSION_OF]->(a_1)
CREATE
(b:Product {id: 'b'}),
(b_1:Component {id: 'b_1', type: 'comp_1'})-[:COMPONENT_OF]->(b),
(b_1_1:Version {id: 'b_1_1', version:1})-[:VERSION_OF]->(b_1),
(b_1_2:Version {id: 'b_1_2', version:2})-[:VERSION_OF]->(b_1),
(b_1_3:Version {id: 'b_1_3', version:3})-[:VERSION_OF]->(b_1)
CREATE
(c:Product {id:'c'}),
(c_1:Component {id: 'c_1', type: 'comp_1'})-[:COMPONENT_OF]->(c),
(c_1_1:Version {id: 'c_1_1', version: 1})-[:VERSION_OF]->(c_1)
CREATE
(a_1_1)-[:DEPENDS_ON]->(b_1_2),
(a_1_1)-[:DEPENDS_ON]->(b_1_3),
(b_1_3)-[:DEPENDS_ON]->(c_1_1)
Here is how to get the versions that version a_1_1
depends on:
MATCH (v:Version)-[:DEPENDS_ON*]->(x:Version)
WHERE v.id = 'a_1_1'
RETURN x.id
And here is the result:
╒═══════╕
│"x.id" │
╞═══════╡
│"b_1_3"│
├───────┤
│"c_1_1"│
├───────┤
│"b_1_2"│
└───────┘
[UPDATE]
To reduce the total number of DEPENDS_ON
relationships, we can use optional VersionSet
nodes that represent shared collections of versions. A VersionSet
can be used as either endpoint of a DEPENDS_ON
relationship, just like Version
nodes. And they can even be chained together.
We can modify the above Cypher example code by replacing its last CREATE
clause with the following. In this new example, VersionSet s1
represents dependencies shared by versions a_1_1
and d_1_1
. Notice how sets s2
and s1
are part of a chain. Also notice how version d_1_1
depends on a Version
as well as VersionSet
. There is a lot of flexibility, and we can even handle a chain of version sets with overlapping versions (d_1_1
depends on c_1_1
in 2 different ways).
...
CREATE
(d:Product {id:'d'}),
(d_1:Component {id: 'd_1', type: 'comp_2'})-[:COMPONENT_OF]->(d),
(d_1_1:Version {id: 'd_1_1', version: 1})-[:VERSION_OF]->(d_1)
CREATE
(s1:VersionSet),
(s1)-[:DEPENDS_ON]->(b_1_2),
(s1)-[:DEPENDS_ON]->(b_1_3),
(s2:VersionSet),
(s2)-[:DEPENDS_ON]->(s1), // chaining 2 sets
(s2)-[:DEPENDS_ON]->(b_1_1)
CREATE
(a_1_1)-[:DEPENDS_ON]->(s1),
(b_1_3)-[:DEPENDS_ON]->(c_1_1),
(d_1_1)-[:DEPENDS_ON]->(s2),
(d_1_1)-[:DEPENDS_ON]->(c_1_1) // redundant dependency
This updated query returns the distinct version dependencies for d_1_1
:
MATCH (v:Version)-[:DEPENDS_ON*]->(x:Version)
WHERE v.id = 'd_1_1'
RETURN DISTINCT x.id
The result is:
╒═══════╕
│"x.id" │
╞═══════╡
│"c_1_1"│
├───────┤
│"b_1_1"│
├───────┤
│"b_1_3"│
├───────┤
│"b_1_2"│
└───────┘