I have a csv
file wherein some fields are array-types. Fields are separated with ,
and array items are separated with ;
. For example:
index, name, friends, neighbors
0,Jim,John;Tim;Fred,Susan;Megan;Cheryl
1,Susan,Jim;John,Megan;Cheryl
2,Sean,,,
where Jim
has three friends, John
, Tim
, and Fred
, and three neighbors, Susan
, Megan
, and Cheryl
, and Sean
has no friends and no neighbors.
However, when I read this into neo4j
using apoc.load.csv
, I end up with list properties with empty strings inside of them (rather than empty lists). For example:
CALL apoc.periodic.iterate("
CALL apoc.load.csv('file.csv',
{header:true,sep:',',
mapping:{
friends:{array:true},
neighbors:{array:true}}
})
YIELD map as row RETURN row
","
CREATE (p:Person) SET p = row
",
{batchsize:50000, iterateList:true, parallel:true});
Gives me a Person
with name Sean
but with friends=[ "" ]
and neighbors=[ "" ]
.
What I want is Sean
to have friends=[]
and neighbors=[]
.
Thank you!
Make sure there are no extraneous spaces in your CSV file header (or else some property names will start with a space):
index,name,friends,neighbors
0,Jim,John;Tim;Fred,Susan;Megan;Cheryl
1,Susan,Jim;John,Megan;Cheryl
2,Sean,,,
Use list comprehension to help eliminate all friends
and neighbors
elements that are empty strings:
CALL apoc.periodic.iterate(
"CALL apoc.load.csv(
'file.csv',
{
header:true, sep:',',
mapping: {
friends: {array: true},
neighbors: {array: true}
}
}) YIELD map
RETURN map
",
"CREATE (p:Person)
SET p = map
SET p.friends = [f IN p.friends WHERE f <> '']
SET p.neighbors = [n IN p.neighbors WHERE n <> '']
",
{batchsize:50000, iterateList:true, parallel:true}
);
With the above changes, this query:
MATCH (person:Person) RETURN person;
returns this result:
╒══════════════════════════════════════════════════════════════════════╕
│"person" │
╞══════════════════════════════════════════════════════════════════════╡
│{"name":"Jim","index":"0","neighbors":["Susan","Megan","Cheryl"],"frie│
│nds":["John","Tim","Fred"]} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Susan","index":"1","neighbors":["Megan","Cheryl"],"friends":[│
│"Jim","John"]} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Sean","index":"2","neighbors":[],"friends":[]} │
└──────────────────────────────────────────────────────────────────────┘
[UPDATED]
Also, if it is not possible for your CSV file to contain an "empty" friend or neighbor substring (e.g., John;;Fred
), then this version of the query that uses CASE
instead of list comprehension would be more efficient:
CALL apoc.periodic.iterate(
"CALL apoc.load.csv(
'file.csv',
{
header:true, sep:',',
mapping: {
friends: {array: true},
neighbors: {array: true, arraySep:';'}
}
}) YIELD map
RETURN map
",
"CREATE (p:Person)
SET p = map
SET p.friends = CASE p.friends WHEN [''] THEN [] ELSE p.friends END
SET p.neighbors = CASE p.neighbors WHEN [''] THEN [] ELSE p.neighbors END
",
{batchsize:50000, iterateList:true, parallel:true}
);