new to neo4j, I want to load a JSON with the following structure into my neo4j DB:
{
"nodes": [
{
"last_update": 1629022369,
"pub_key": "pub1",
"alias": "alias1"
},
{
"last_update": 1618162974,
"pub_key": "pub2",
"alias": "alias2"
},
{
"last_update": 1634745976,
"pub_key": "pub3",
"alias": "alias3"
}
],
"edges": [
{
"node1_pub": "pub1",
"node2_pub": "pub2",
"capacity": "37200"
},
{
"node1_pub": "pub2",
"node2_pub": "pub3",
"capacity": "37200"
},
{
"node1_pub": "pub3",
"node2_pub": "pub1",
"capacity": "37200"
}
]
}
I load nodes and edges in separate queries:
WITH "file:///graph.json" AS graph
CALL apoc.load.json(graph) YIELD value
FOREACH (nodeObject in value.nodes | CREATE (node:Node {pubKey:nodeObject.pub_key}))
WITH "file:///graph.json" AS graph
CALL apoc.load.json(graph) YIELD value
UNWIND value.edges as edgeObject
MATCH (node1:Node {pubKey: edgeObject.node1_pub})
MATCH (node2:Node {pubKey: edgeObject.node2_pub})
CREATE (node1)-[:IS_CONNECTED {capacity: edgeObject.capacity}]->(node2)
This works fine with a small number of edges, but I have a ~100mb file with plenty of edges in there. In the latter case, the query does not return. I'm running it from the neo4j web interface. neo4j is running in docker and the max heap size is set to 3g, which should be more than enough.
I have not grasped all of the concepts of Cypher, so probably there is some better way to do it anyways. Also maybe in one query, so that the file does not need to be loaded twice.
Thanks a lot!
okay, after trying out the batching suggested by @jose_bacoy, I saw that even 1000 rows took around 20s. Obviously, the MATCH operation is quite CPU intensive. After I created an index the import of 80k edges worked like a charm.
CREATE INDEX FOR (n:Node) ON (n.pubKey)