Search code examples
neo4jcypher

Cypher Query to Filter SCENARIO Nodes Based on GAP Nodes


I'm working with a Neo4j database with a data model with GAP and SCENARIO nodes. Each GAP node has a "next" property that lists the keys of connected SCENARIO nodes. The SCENARIO nodes have a "parents" property indicating how many parent nodes they have. I want to filter SCENARIO nodes based on a list of selected GAP nodes and the number of parents.

Here is how GAP nodes are supposed to SCENARIO nodes:

  • gap1 key:g1
  • gap2 key:g2
  • gap3 key:g3

  • scenario1 key: s1
  • scenario2 key: s2
  • scenario3 key: s3

  • parents of scenario 1 are: gap1, gap2
  • parents of scenario 2 are: gap1, gap2, gap3
  • parents of scenario 3 are: gap2

Here is my GAP and SCENARIO nodes model:

 (gap1:GAP {name:  'g1',
            key:   'g1',
            desc:  '',
            next:  's1,s2'
  }),
//GAP2
  (gap2:GAP {name:  'g2',
             key:   'g2',
             desc:  '',
             next:  's1,s2,s3'
  }),
//GAP3
  (gap3:GAP {name:  'g3',
             key:   'g3',
             desc:  'g3',
             next:  's2'
  }),

//SCENARIO
  (scenario1:SCENARIO {name:    's1',
                       key:     's1',
                       DESC:    '',
                       parents: 2
                       }),
  (scenario2:SCENARIO {name:    's2',
                       key:     's2',
                       DESC:    '',
                       parents: 3
                       }),
  (scenario3:SCENARIO {name:    's3',
                       key:     's3',
                       DESC:    '',
                       parents: 1}),

If a gap is provided as input in the query and it is a parent of a scenario, but the number of parents of that scenario is more than 1 in the parent node, it should exclude the scenario from the result. If two gaps are provided, and both are parents of a single scenario, other scenario nodes are included. Still, the parent condition only meets for that single node; it should only return that single node.

I have tried too many combinations but always fail somewhere. This query somehow worked, but it failed for the following conditions:

  1. When g1,g2 are provided, it should return s1 only. it returns all three scenario
  2. if g1 is provided, it should return empty, but it returns s1,s2
  3. if g2 is provided, it should return only s3 but it returns all s1,s2,s3

I have been stuck on this for around three days but have been unable to figure out a solution. Please note that, per the scenario, a direct link from GAP to SCENARIO will not solve the problem.

MATCH (gap:GAP)
WHERE gap.name IN ['g1', 'g2', 'g3']

WITH COLLECT(DISTINCT gap) AS selectedGaps

WHERE size(selectedGaps) > 0  

MATCH (scenario:SCENARIO)
WHERE ANY(gap IN selectedGaps WHERE scenario.key IN split(gap.next, ','))
RETURN scenario.name AS ScenarioName;

Solution

  • You have inconsistencies in your sample data.

    • scenario2 says it has 3 parents, but there are only 2 GAP nodes containing the s2 key.
    • scenario3 says it has 1 parent, but there are actually 2 GAP nodes containing the s3 key.

    Here is a query that should work, assuming you fix your data to be consistent:

    MATCH (gap:GAP)
    WHERE gap.name IN $gaps
    UNWIND SPLIT(gap.next, ',') AS sKey
    WITH sKey, COUNT(*) AS cnt
    WITH COLLECT({sKey: sKey, cnt: cnt}) AS data
    MATCH (s:SCENARIO)
    WHERE ANY(d IN data WHERE d.sKey = s.key AND d.cnt = s.parents)
    RETURN s.name AS ScenarioName;
    

    The query assumes that the list of GAP names is provided as a $gaps parameter.

    Suggested new model

    You should strongly consider changing your data model, however. You are not using any relationships, which is the main strength of a graph database. GAP and SCENARIO nodes should not store relationship information as properties, but should be linked by actual relationships. I would also recommend conforming to the recommended neo4j naming convention, where labels are camel-cased.

    For example, this could conceptually be your new data model (without the next and parents properties):

    (g:Gap {name, key, desc})-[:HAS_SCENARIO]->(s:Scenario {name, key, desc})
    

    With this new model, the query for your use case would simply be:

    MATCH (g:Gap)-[:HAS_SCENARIO]->(s:Scenario)
    WHERE g.name IN $gaps
    WITH s, COUNT(g) AS cnt
    WHERE COUNT{ ()-[:HAS_SCENARIO]->(s) } = cnt
    RETURN s.name AS scenarioName;