I'm using LOAD CSV
to import data from csv
to neo4j
. My dataset contains multiple values in the country
field. Currently I'm using a semicolon as the separator of those multiple values.
nodes-person.csv
id,country
http://author,country1;country2;country3
And this is the cypher query which I use to import data into neo4j
LOAD CSV WITH HEADERS FROM "file:///nodes-person.csv" AS csvLine
MERGE (p:`person` {id: csvLine.id})
ON CREATE
SET
p.country = split(csvLine.country,";")
ON MATCH
SET
p.country = split(csvLine.country,";")
RETURN p;
My question is, how can I split the values properly if the values contain the separator character.
ie:
country\\;1 ; country\\;2 ; country\\;3
You've got a couple of options - one is pure Cypher and slightly untidy, the other is using APOC and regular expressions. I'm making the assumption that if the semicolon appears within a country name it's escaped with a single backslash.
The plan here is to do three replacements:
__SEMICOLON__
)__SEMICOLON__
instances with a semicolon characterSomething like the following would work (the WITH
is just so it's runnable in isolation):
WITH 'country1\\;;country2;country3\\;' as countries
RETURN [x in split(replace(countries, '\\;', '__SEMICOLON__'), ';') | replace(x, '__SEMICOLON__', ';')]
A tidier approach is to use apoc.text.replace
and supply a regular expression as the 'separator', where we want to split the string by semicolons that are not preceded by the backslash escape character:
WITH 'country1\\;;country2;country3\\;' as countries
RETURN [x in apoc.text.split(countries, '(?<!\\\\);') | replace(x, '\\;', ';')]
We do a final tidy-up to replace the escaped semicolons with plain semicolons for storage with that list comprehension. The regex is shamelessly stolen from this answer.