Search code examples
rubyneo4jneography

neo4j : batch import relations


I'm having trouble to import relationships in a graph.

Let's say I have a few hundreds unique,indexed users that are already created. Then I'd like to create about 120k nodes, each of them being linked to some user through a relationship.

Unfortunately, I'm unable to find a way to batch the import. I'm trying to perform this with the neography ruby gem, but as I'm very new with this environment I wouldn't mind using another way if needed.

What I tried :

@neo.batch(
  [:get_node_index, 'user', 'user_id', '1'], #attempt to get the node from index
  [:create_node, {"foo => 'bar'}],
  [:create_relationship, "has" , "{0}", "{1}"] 
) # => fails

,

@neo.batch(
  [:create_unique_node, "user", "user_id", "1"], #attempt to create or get the node
  [:create_node, {"foo" => "bar"}],
  [:create_relationship, "has", "{0}", "{1}"]
) # => fails. 

Please note that it is nonetheless possible to batch some create_unique_node commands alone.

The only way I could get the script run is to use

@neo.batch(
  [:create_node, {"user_id" => 1}], #works, but duplicates the node
  [:create_node, {"foo" => "bar"}],
  [:create_relationship, "has", "{0}", "{1}"]
) # => success

However, this will duplicate all my user nodes, which definitely not what I want to achieve. It seems my question is similar to this one, however I don't get at all how am I supposed to use the index when creating the relationships.

Any help would be much appreciated, thanks in advance


Solution

  • Since this question has been upvoted I'm posting the workaround I found since then:

    As mentionned in the question, it is possible to batch the create_unique_node to create the nodes. The batch command then returns a list of pointers in which you can get the neo4j ids of each node. I'm not sure to remember wether I had to extract the ids from some hash structure or not, but I'm sure you'll get the point.

    So basically, I first created a batch and stored the result in an array :

    ids = @neo.batch(
        # list of `create_nodes` commands
    ) #=> returns a list of neo4j ids that you can use further.
    

    To link the nodes, I used a second batch command. Instead of using the failing {id} reference, you can simply use the (absolute) neo4j id of the nodes, so this would look like

    [:create_relationship, "something", id1, id2]
    

    where id1 and id2 are given by ids.

    This is basically a solution where I'm using absolute ids rather than relative ones....