I have two distinct types of nodes (LABELS in redisgraph). The blue circles in the diagram are called WORKLOAD
. The yellow ones are called API
.
Between two workloads, there is a relationship/edge ACCESSES_WORKLOAD
(indicated by the lightest blue/cyan edge). In the attached image, front-end
workload accesses user
and order
workloads. The order
workload accesses user
workload.
Between a API
node and a WORKLOAD
node there exists two types of relationships/edges. One is called a CONTAINS_API
(dark blue link in the image) and another is called a ACCESSES_API
(violet link in the image). For example, in the attached image, the user
workload contains eight API
s, Out of which six are consumed by front-end
(the list: includes 3 /customers/...
, /addresses
, /register
, /cards
). The two remaining API
s are consumed by the orders
(the list: /cards/{cardsId}
and /addresses/{addressesId}
.
The front-end
also accesses three API
that are contained by the orders
workload too (/orders/...
)
It is also possible that two WORKLOAD
s may have an ACCESSES_WORKLOAD
edge between them, but without any API
as node as common between them.
A WORKLOAD
may have an extra edge/relationship called CAUSES_ATTACK
to an API
in addition to the ACCESSES_API
edge. It is indicated by green color in the image. This may not exist always. In the attached image, the front-end
workload has this edge to the /catalogue/sock
API but not with any of the other API
s.
Now I want to get the list of all workload-pairs and the ID of the API
nodes in between them, if any. The cypher query that I tried is:
MATCH (w1)
MATCH (w1)-[awe:ACCESSES_WORKLOAD]->(w2 {deleted_time: -1})
OPTIONAL MATCH (w2)-[:CONTAINS_API]->(a:API)
OPTIONAL MATCH (w1)-[:ACCESSES_API]->(a)
OPTIONAL MATCH (w1)-[cae:CAUSES_ATTACK]->(a)
WHERE ID(w1) IN %s
RETURN DISTINCT awe, w1, w2, collect(ID(a)), collect(distinct cae.name)
An output that I was expecting was table like below:
w1 | w2 | API IDs List | Attack Name
======================================================================
front-end-node | user-node | [6 elements in a list] | null
orders-node | user-node | [2 elements in a list] | null
front-end-node | orders-node | [3 elements in a list] | null
front-end-node | some-node | [1 element in a list] | attack-name
But what I am getting is:
w1 | w2 | API IDs List | Attack Name
======================================================================
front-end-node | user-node | [8 elements in a list] | null
orders-node | user-node | [8 elements in a list] | null
front-end-node | orders-node | [3 elements in a list] | null
front-end-node | some-node | [1 element in a list] | attack-name
In the first two rows, for the user
node, I am getting the list of all the API
s contained by the user
workload and not just the API
s accessed by the front-end
and orders
. The collect(ID(a))
is aggregating all the API
nodes with a CONTAINS_API
edge/relationship with the w2 WORKLOAD
in the query instead of the API
s with only ACCESSES_API
with w1 workload
.
If I were to map my expectation to JSON for brevity, the output I require will be like:
{
sourceWorkloadID: <ID(front-end)>
targetWorkloadID: <ID(user)>
apis: [ ID(/cards), ID(/register), ID(/addresses), ID(/customers/...)] // len here is six
attack: null
},
{
sourceWorkloadID: <ID(orders)>
targetWorkloadID: <ID(user)>
apis: [ ID(/cards/{cardsId), ID(/addresses/{addressesId})] // len here is two
attack: null
}, {3rd row}, {4th row}
Can someone help fix the cypher query ? I am doing this in redisgraph
via cypher
. I cannot use neo4j-only query options/utils/helpers. Thanks.
The way to do this is to do the edge matching into the same line as the API MATCH:
MATCH (w1)
MATCH (w1)-[awe:ACCESSES_WORKLOAD]->(w2 {deleted_time: -1})
OPTIONAL MATCH (w2)-[:CONTAINS_API]->(a:API)<-[:ACCESSES_API]-(w1)
OPTIONAL MATCH (w1)-[cae:CAUSES_ATTACK]->(a)
WHERE ID(w1) IN %s
RETURN DISTINCT awe, w1, w2, collect(ID(a)), collect(distinct cae.name)