Search code examples
neo4jcypher

How do you perform a recursive query in cypher where there is a conditional within the path relationship?


I am attempting to setup a new graph database to contain records of products and their relationship on each other's versioned components. For each product it can have many components, and each component is made up of multiple versions. Each version can be dependent on none or many versions of any other components. I want to be able to query this database to pick any version of a component and determine what other versioned components it is depended on, or what depends on it.

The data structure I have attempted in my examples is not defined yet, so if a completely different structure is more suitable I'm open to changing it. I originally considered setting the DEPENDS_ON relationship directly between members. However, as new members will be added over time if a new member is added and falls within the version_min and version_max range of an existing records dependancy range, I would then need to go back and identify all affected records and update all of them, which doesn't feel like it would scale over time. This is what lead to the idea of having a member being dependent on a component, with the version limits defined in the relationship parameters.

I have put together a very simple example of 3 products (sample data at the end), with a single type of component and 1 version of each in all cases except one. I've then added only two dependencies into this example, 'a' depends on a range of 'b' versions, and one of the 'b' versions depends on a version of 'c'.

I would like to be able to perform a query to say "give me all downstream members which member prod_a_comp_1_v_1 depends on". Similarly I would like to do this in reverse too, which I imagine is achieved by just reversing some of the relationship parameters.

So far I've achieved this for a single hop (list b versions which a depends on), shown here:

MATCH
p=(a:member{name:'prod_a_comp_1_v_1'})-[d:DEPENDS_ON]->(c:component)<-[v:VERSION_OF]-(b:member) WHERE b.version >= d.version_min AND b.version <= d.version_max
RETURN p

Single hop filtered dependent member versions

But I don't know how to get it to recursively perform this query on the results of this first match. I investigated variable length/depths, but because there is a conditional parameter in the relationship in the variable depth (DEPENDS_ON), I could not get this to work.

From the example data if querying all downstream dependencies of prod_a_comp_1_v_1 it should return: [prod_b_comp_1_v_2, prod_b_comp_1_v_3, prod_c_comp_1_v_1]. e.g. this figure: Recursive query of all downstream dependencies

Currently my thought is to use the above query and perform the repeated call on the database based on the results from the client end (capturing circular loops etc.), but that seems very undesirable.

Sample data:

CREATE
(prod_a:product {name:'prod_a'}),
(prod_a_comp_1:component {name: 'prod_a_comp_1', type:'comp_1'}),
(prod_a_comp_1)-[:COMPONENT_OF {type:'comp_1'}]->(prod_a),
(prod_a_comp_1_v_1:member {name:'prod_a_comp_1_v_1', type:'comp_1', version:1}),
(prod_a_comp_1_v_1)-[:VERSION_OF {version:1}]->(prod_a_comp_1)

CREATE
(prod_b:product {name:'prod_b'}),
(prod_b_comp_1:component {name: 'prod_b_comp_1', type:'comp_1'}),
(prod_b_comp_1)-[:COMPONENT_OF {type:'comp_1'}]->(prod_b),
(prod_b_comp_1_v_1:member {name:'prod_b_comp_1_v_1', type:'comp_1', version:1}),
(prod_b_comp_1_v_2:member {name:'prod_b_comp_1_v_2', type:'comp_1', version:2}),
(prod_b_comp_1_v_3:member {name:'prod_b_comp_1_v_3', type:'comp_1', version:3}),
(prod_b_comp_1_v_1)-[:VERSION_OF {version:1}]->(prod_b_comp_1),
(prod_b_comp_1_v_2)-[:VERSION_OF {version:2}]->(prod_b_comp_1),
(prod_b_comp_1_v_3)-[:VERSION_OF {version:3}]->(prod_b_comp_1)

CREATE
(prod_c:product {name:'prod_c'}),
(prod_c_comp_1:component {name: 'prod_c_comp_1', type:'comp_1'}),
(prod_c_comp_1)-[:COMPONENT_OF {type:'comp_1'}]->(prod_c),
(prod_c_comp_1_v_1:member {name:'prod_c_comp_1_v_1', type:'comp_1', version:1}),
(prod_c_comp_1_v_1)-[:VERSION_OF {version:1}]->(prod_c_comp_1)

CREATE
(prod_a_comp_1_v_1)-[:DEPENDS_ON {version_min:2, version_max:3}]->(prod_b_comp_1),
(prod_b_comp_1_v_3)-[:DEPENDS_ON {version_min:1, version_max:100}]->(prod_c_comp_1)

Figure showing full sample data set: Full example dataset


Solution

  • First, let me first adjust your node labels to use camel casing to follow the suggested neo4j naming convention. I will also rename member to the much more intuitive Version label.

    You can change your data model so that the DEPENDS_ON relationship type directly connects Version nodes. Although this can result in many more relationships, it will also allow you to skip version numbers within a min/max range (e.g., if some version numbers in the range are unreliable/expensive/the wrong color/etc.). But the main reason to do that is it will make your use case extremely simple.

    Here is how to create an updated version of your sample data, with my suggested adjustments (and with less redundant info):

    CREATE
      (a:Product {id: 'a'}),
      (a_1:Component {id: 'a_1', type: 'comp_1'})-[:COMPONENT_OF]->(a),
      (a_1_1:Version {id: 'a_1_1', version: 1})-[:VERSION_OF]->(a_1)
    
    CREATE
      (b:Product {id: 'b'}),
      (b_1:Component {id: 'b_1', type: 'comp_1'})-[:COMPONENT_OF]->(b),
      (b_1_1:Version {id: 'b_1_1', version:1})-[:VERSION_OF]->(b_1),
      (b_1_2:Version {id: 'b_1_2', version:2})-[:VERSION_OF]->(b_1),
      (b_1_3:Version {id: 'b_1_3', version:3})-[:VERSION_OF]->(b_1)
    
    CREATE
      (c:Product {id:'c'}),
      (c_1:Component {id: 'c_1', type: 'comp_1'})-[:COMPONENT_OF]->(c),
      (c_1_1:Version {id: 'c_1_1', version: 1})-[:VERSION_OF]->(c_1)
    
    CREATE
      (a_1_1)-[:DEPENDS_ON]->(b_1_2),
      (a_1_1)-[:DEPENDS_ON]->(b_1_3),
      (b_1_3)-[:DEPENDS_ON]->(c_1_1)
    

    Here is how to get the versions that version a_1_1 depends on:

    MATCH (v:Version)-[:DEPENDS_ON*]->(x:Version)
    WHERE v.id = 'a_1_1'
    RETURN x.id
    

    And here is the result:

    ╒═══════╕
    │"x.id" │
    ╞═══════╡
    │"b_1_3"│
    ├───────┤
    │"c_1_1"│
    ├───────┤
    │"b_1_2"│
    └───────┘
    

    [UPDATE]

    To reduce the total number of DEPENDS_ON relationships, we can use optional VersionSet nodes that represent shared collections of versions. A VersionSet can be used as either endpoint of a DEPENDS_ON relationship, just like Version nodes. And they can even be chained together.

    We can modify the above Cypher example code by replacing its last CREATE clause with the following. In this new example, VersionSet s1 represents dependencies shared by versions a_1_1 and d_1_1. Notice how sets s2 and s1 are part of a chain. Also notice how version d_1_1 depends on a Version as well as VersionSet. There is a lot of flexibility, and we can even handle a chain of version sets with overlapping versions (d_1_1 depends on c_1_1 in 2 different ways).

    ...
    
    CREATE
      (d:Product {id:'d'}),
      (d_1:Component {id: 'd_1', type: 'comp_2'})-[:COMPONENT_OF]->(d),
      (d_1_1:Version {id: 'd_1_1', version: 1})-[:VERSION_OF]->(d_1)
    
    CREATE
      (s1:VersionSet),
      (s1)-[:DEPENDS_ON]->(b_1_2),
      (s1)-[:DEPENDS_ON]->(b_1_3),
      (s2:VersionSet),
      (s2)-[:DEPENDS_ON]->(s1),      // chaining 2 sets
      (s2)-[:DEPENDS_ON]->(b_1_1)
    
    CREATE
      (a_1_1)-[:DEPENDS_ON]->(s1),
      (b_1_3)-[:DEPENDS_ON]->(c_1_1),
      (d_1_1)-[:DEPENDS_ON]->(s2),
      (d_1_1)-[:DEPENDS_ON]->(c_1_1) // redundant dependency
    

    The resulting graph: enter image description here

    This updated query returns the distinct version dependencies for d_1_1:

    MATCH (v:Version)-[:DEPENDS_ON*]->(x:Version)
    WHERE v.id = 'd_1_1'
    RETURN DISTINCT x.id
    

    The result is:

    ╒═══════╕
    │"x.id" │
    ╞═══════╡
    │"c_1_1"│
    ├───────┤
    │"b_1_1"│
    ├───────┤
    │"b_1_3"│
    ├───────┤
    │"b_1_2"│
    └───────┘