WHERE
Scope and Query Optimization in Fraud Detection GraphI'm new to Neo4j and trying to build a fraud detection graph using a dataset found online.
I am facing two main problems with my Cypher query:
Scope of WHERE
Clauses:
WHERE
statements apply to the entire following part of the query instead of only the relevant section.CALL
statements but couldn't make them work correctly.Memory Issue & Query Performance:
Before running the query, I created the following indexes:
CREATE INDEX person_index FOR (p:PERSONNE) ON (p.Numero_client);
CREATE INDEX phone_index FOR (t:TELEPHONE) ON (t.Numero);
CREATE INDEX car_index FOR (v:VOITURE) ON (v.Immatriculation);
CREATE INDEX email_index FOR (e:EMAIL) ON (e.Adresse_mail);
CREATE INDEX address_index FOR (a:ADRESSE) ON (a.Rue, a.Code_postal, a.Commune);
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
// 1⃣ - Create PERSONNE nodes
MERGE (p:PERSONNE {Numero_client: row.anc_ref})
SET p.Chainage = row.chainage,
p.Sexe = row.sexe,
p.cres_soc = row.cres_soc,
p.Age = row.age_cli,
p.Nom = row.nom_cli,
p.Prenom = row.pnom_cli,
p.Nom_conjoint = row.nom_cjt,
p.Prenom_conjoint = row.pnom_cjt
WITH p, row
WHERE row.cres_soc <> "00" AND row.dres_soc IS NOT NULL
SET p.dres_soc = row.dres_soc
// 2⃣ - Create TELEPHONE nodes and relationships
WITH p, row
UNWIND [row.num_tel1, row.num_tel2] AS tel
WITH p, row, tel
WHERE tel IS NOT NULL AND tel <> "" AND tel <> " "
MERGE (t:TELEPHONE {Numero: tel})
MERGE (p)-[:A_POUR_TELEPHONE {type: CASE WHEN tel = row.num_tel1 THEN "PRINCIPAL" ELSE "SECONDAIRE" END}]->(t)
// 3⃣ - Create VOITURE nodes
WITH p, row
WHERE row.num_mnrl1 IS NOT NULL AND row.num_mnrl1 <> "" AND row.num_mnrl1 <> " "
MERGE (v:VOITURE {Immatriculation: row.num_mnrl1})
MERGE (p)-[:POSSEDE_VOITURE]->(v)
// 4⃣ - Create EMAIL nodes
WITH p, row
WHERE row.mail IS NOT NULL AND row.mail <> "" AND row.mail <> " "
MERGE (e:EMAIL {Adresse_mail: row.mail})
MERGE (p)-[:A_POUR_MAIL]->(e)
// 5⃣ - Create ADRESSE nodes
WITH p, row
WHERE row.rue IS NOT NULL AND row.cd_post IS NOT NULL AND row.commune IS NOT NULL
MERGE (a:ADRESSE {Rue: row.rue, Code_postal: row.cd_post, Commune: row.commune})
MERGE (p)-[:RESIDE_A]->(a);
I tried breaking the query into smaller parts:
PERSONNE
nodesLOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
MERGE (p:PERSONNE {Numero_client: row.anc_ref})
SET p.Chainage = row.chainage,
p.Sexe = row.sexe,
p.cres_soc = row.cres_soc,
p.Age = row.age_cli,
p.Nom = row.nom_cli,
p.Prenom = row.pnom_cli,
p.Nom_conjoint = row.nom_cjt,
p.Prenom_conjoint = row.pnom_cjt
WITH p, row
WHERE row.cres_soc <> "00" AND row.dres_soc IS NOT NULL
SET p.dres_soc = row.dres_soc
TELEPHONE
nodes and relationshipsLOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
MATCH (p:PERSONNE {Numero_client: row.anc_ref})
WITH p, row
UNWIND [row.num_tel1, row.num_tel2] AS tel
WITH p, row, tel
WHERE tel IS NOT NULL AND tel <> "" AND tel <> " "
MERGE (t:TELEPHONE {Numero: tel})
MERGE (p)-[:A_POUR_TELEPHONE {type: CASE WHEN tel = row.num_tel1 THEN "PRINCIPAL" ELSE "SECONDAIRE" END}]->(t)
This approach sometimes works, but most of the time, I get one of the following errors:
ServiceUnavailable
WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver. Please use your browser's development console to determine the root cause of the failure.
Common reasons include:
- The database being unavailable
- Using the wrong connection URL
- Temporary network problems
- Browser not trusting the Neo4j certificate
WebSocket `readyState` is: 3
Or:
Neo.TransientError.General.OutOfMemoryError
There is not enough memory to perform the current task.
Please try increasing 'server.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you are using Neo4j Desktop, found through the user interface)
or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.
WHERE
statements only apply to the relevant section of my query?MATCH
after LOAD CSV
?Any help would be greatly appreciated!
This answer attempts to address questions 1, 2, and 3.
You can use CALL subqueries to perform updates to the DB, and that can be done in ways that do not affect the number of rows in the outer query. For example, most of the subqueries in the example below are Unit subqueries that do not RETURN
anything), which never affect the number of rows in the outer query. The first subquery does RETURN something (p
), but it returns a value for every outer row (because it does not filter out any rows), so it also does not affect the number of outer rows.
Also, you can have those subqueries execute multiple transactions, to process subquery rows in batches (of 1000, by default), to avoid running out of memory.
Potentially, something like this may work for you:
:auto // Only needed if running in Neo4j Browser
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
// 1 - Create PERSONNE nodes
CALL (row) {
MERGE (p:PERSONNE {Numero_client: row.anc_ref})
SET
p.Chainage = row.chainage,
p.Sexe = row.sexe,
p.cres_soc = row.cres_soc,
p.dres_soc = CASE WHEN row.cres_soc <> "00" THEN row.dres_soc END,
p.Age = row.age_cli,
p.Nom = row.nom_cli,
p.Prenom = row.pnom_cli,
p.Nom_conjoint = row.nom_cjt,
p.Prenom_conjoint = row.pnom_cjt
RETURN p
} IN TRANSACTIONS
// 2 - Create TELEPHONE nodes and relationships
CALL (p, row) {
WITH p, row
UNWIND [row.num_tel1, row.num_tel2] AS tel
WITH p, row, tel
WHERE tel IS NOT NULL AND tel <> "" AND tel <> " "
MERGE (t:TELEPHONE {Numero: tel})
MERGE (p)-[:A_POUR_TELEPHONE {type: CASE WHEN tel = row.num_tel1 THEN "PRINCIPAL" ELSE "SECONDAIRE" END}]->(t)
} IN TRANSACTIONS
// 3 - Create VOITURE nodes
CALL (p, row) {
WITH p, row
WHERE row.num_mnrl1 IS NOT NULL AND row.num_mnrl1 <> "" AND row.num_mnrl1 <> " "
MERGE (v:VOITURE {Immatriculation: row.num_mnrl1})
MERGE (p)-[:POSSEDE_VOITURE]->(v)
} IN TRANSACTIONS
// 4 - Create EMAIL nodes
CALL (p, row) {
WITH p, row
WHERE row.mail IS NOT NULL AND row.mail <> "" AND row.mail <> " "
MERGE (e:EMAIL {Adresse_mail: row.mail})
MERGE (p)-[:A_POUR_MAIL]->(e)
} IN TRANSACTIONS
// 5 - Create ADRESSE nodes
CALL (p, row) {
WITH p, row
WHERE row.rue IS NOT NULL AND row.cd_post IS NOT NULL AND row.commune IS NOT NULL
MERGE (a:ADRESSE {Rue: row.rue, Code_postal: row.cd_post, Commune: row.commune})
MERGE (p)-[:RESIDE_A]->(a)
} IN TRANSACTIONS
Notes:
:auto
command should only be used if running in the Neo4j Browser (it is needed for the IN TRANSACTIONS
functionality).WITH
clauses in each subquery.p
nodes, have to held in memory), in which case you many want to split the file into smaller ones to be processed separately.