Search code examples
neo4jrediscyphergraph-databasesredisgraph

cypher aggregating common node type between two distinct node types


Graph View

  • I have two distinct types of nodes (LABELS in redisgraph). The blue circles in the diagram are called WORKLOAD. The yellow ones are called API.

  • Between two workloads, there is a relationship/edge ACCESSES_WORKLOAD (indicated by the lightest blue/cyan edge). In the attached image, front-end workload accesses user and order workloads. The order workload accesses user workload.

  • Between a API node and a WORKLOAD node there exists two types of relationships/edges. One is called a CONTAINS_API (dark blue link in the image) and another is called a ACCESSES_API (violet link in the image). For example, in the attached image, the user workload contains eight APIs, Out of which six are consumed by front-end (the list: includes 3 /customers/..., /addresses, /register, /cards). The two remaining APIs are consumed by the orders (the list: /cards/{cardsId} and /addresses/{addressesId}.

  • The front-end also accesses three API that are contained by the orders workload too (/orders/...)

  • It is also possible that two WORKLOADs may have an ACCESSES_WORKLOAD edge between them, but without any API as node as common between them.

  • A WORKLOAD may have an extra edge/relationship called CAUSES_ATTACK to an API in addition to the ACCESSES_API edge. It is indicated by green color in the image. This may not exist always. In the attached image, the front-end workload has this edge to the /catalogue/sock API but not with any of the other APIs.

Now I want to get the list of all workload-pairs and the ID of the API nodes in between them, if any. The cypher query that I tried is:

MATCH (w1)
MATCH (w1)-[awe:ACCESSES_WORKLOAD]->(w2 {deleted_time: -1})
OPTIONAL MATCH (w2)-[:CONTAINS_API]->(a:API)
OPTIONAL MATCH (w1)-[:ACCESSES_API]->(a)
OPTIONAL MATCH (w1)-[cae:CAUSES_ATTACK]->(a)
WHERE ID(w1) IN %s
RETURN DISTINCT awe, w1, w2, collect(ID(a)), collect(distinct cae.name)

An output that I was expecting was table like below:

w1             | w2          | API IDs List           | Attack Name
======================================================================
front-end-node | user-node   | [6 elements in a list] | null
orders-node    | user-node   | [2 elements in a list] | null
front-end-node | orders-node | [3 elements in a list] | null
front-end-node | some-node   | [1 element in a list]  | attack-name

But what I am getting is:

w1             | w2          | API IDs List           | Attack Name
======================================================================
front-end-node | user-node   | [8 elements in a list] | null
orders-node    | user-node   | [8 elements in a list] | null
front-end-node | orders-node | [3 elements in a list] | null
front-end-node | some-node   | [1 element in a list]  | attack-name

In the first two rows, for the user node, I am getting the list of all the APIs contained by the user workload and not just the APIs accessed by the front-end and orders. The collect(ID(a)) is aggregating all the API nodes with a CONTAINS_API edge/relationship with the w2 WORKLOAD in the query instead of the APIs with only ACCESSES_API with w1 workload.

If I were to map my expectation to JSON for brevity, the output I require will be like:

{
  sourceWorkloadID: <ID(front-end)>
  targetWorkloadID: <ID(user)>
  apis: [ ID(/cards), ID(/register), ID(/addresses), ID(/customers/...)]  // len here is six
  attack: null
},
{
  sourceWorkloadID: <ID(orders)>
  targetWorkloadID: <ID(user)>
  apis: [ ID(/cards/{cardsId), ID(/addresses/{addressesId})] // len here is two
  attack: null
}, {3rd row}, {4th row}

Can someone help fix the cypher query ? I am doing this in redisgraph via cypher. I cannot use neo4j-only query options/utils/helpers. Thanks.


Solution

  • The way to do this is to do the edge matching into the same line as the API MATCH:

    MATCH (w1)
    MATCH (w1)-[awe:ACCESSES_WORKLOAD]->(w2 {deleted_time: -1})
    OPTIONAL MATCH (w2)-[:CONTAINS_API]->(a:API)<-[:ACCESSES_API]-(w1)
    OPTIONAL MATCH (w1)-[cae:CAUSES_ATTACK]->(a)
    WHERE ID(w1) IN %s
    RETURN DISTINCT awe, w1, w2, collect(ID(a)), collect(distinct cae.name)