I have a schema which looks like below:
A customer
is linked to another customer
with a relationship SIMILAR
having similarity score
.
Example: (c1:Customer)-->(c2:Customer)
An Email
node is connected to each customer
with relationship MAIL_AT
with the following node properties:
{
"active_email_address": "a@mail.com",
"cibil_email_addresses": [
"b@mail.com", "c@mail.com"
]
}
Example: (e1:Email)<-[:MAIL_AT]-(c1:Customer)-[:SIMILAR]->(c2:Customer)-[:MAIL_AT]->(e2:Email)
Risk
node with some risk-related properties (below) and is related to customer
with relationship HAS_RISK
: {
"f0_score": 870.0,
"pta_score": 430.0
}
Fraud
node with some fraud-related properties (below) and is related to customer
with relationship IS_FRAUD
: {
"has_commited_fraud": true
}
My Objectives:
My tentative solution:
MATCH (email:Email)
WITH email.cibil_email_addresses + email.active_email_address AS emailAddress, email
UNWIND emailAddress AS eaddr
WITH DISTINCT eaddr AS deaddr, email
UNWIND deaddr AS eaddress
MATCH (customer:Customer)-[]->(someEmail:Email)
WHERE eaddress IN someEmail.cibil_email_addresses + someEmail.active_email_address
WITH eaddress, COLLECT(customer.customer_id) AS customers
RETURN eaddress, customers
Problem: It is taking forever to execute this. Working with lists will take time I understand, however, I'm flexible to change the schema (If suggested). Should I break the email address into separate nodes? If yes, then how do I break cibil_email_addresses
into different nodes as they can vary - Should I create two nodes with different cibil email addresses
and connect both of them to customer with relationship HAS_CIBIL_EMAIL
? (Is this a valid schema design). Also, it is possible, a customer's active_email_address
is present in other customer's cibil_email_address
. I'm trying to find a synthetic identity attack. PS: If there exists some APOC
that can help achieve this and below, do suggest with example.
In production, for a given customer with email addresses
, risk values
, similarity score
, and also given other customers may or may not be tagged with fraud_status
, I want to check whether this new person will fall in a fraud ring or not. PS: If I need to use any gds
to solve this, please suggest with examples.
If I were to do this same exercise with some other node such as Address
which may be partially matching and will be having same list of historical addresses in a list, what should be my ideal approach?
I know, I'm tagging someone in my question, but that person only seems to be active with respect to Cypher on StackOverflow. @cybersam any help? Thanks.
This should work:
MATCH (e:Email)
UNWIND (e.cibil_email_addresses + e.active_email_address) AS address
WITH address, COLLECT(e) AS es
UNWIND es AS email
MATCH (email)<-[:MAIL_AT]-(cust)
RETURN address, COLLECT(cust) AS customers
The WITH
clause takes advantage of the arregating function COLLECT
to automatically collect all the Email
nodes containing the same address, using address
as the grouping key.
You should only ask one question at a time. You have a couple of other questions at the bottom. If you continue to need help with them, please create new questions.