Search code examples
ruby-on-railsneo4jneography

Parse a big file and populate a Neo4j database


I am working on a Ruby on Rails project that will read and parse somewhat big text file (around 100k lines) and build Neo4j nodes (I am using Neography) with that data. This is the Neo4j related fraction of the code I wrote:

    d= Neography::Rest.new.execute_query("MATCH (n:`Label`) WHERE (n.`name`='#{id}') RETURN n")
    d= Neography::Node.load(d, @neo)
    p= Neography::Rest.new.create_node("name" => "#{id}") 
    Neography::Rest.new.add_label(p, "LabelSample") 
    d=Neography::Rest.new.get_node(d)
    Neography::Rest.new.create_relationship("belongs_to", p, d)

so, what I want to do is: a search in the already populated db for the node with the same "name" field as the parsed data, create a new node for this data and finally create a relationship between the two of them. Obiously this code simply takes way too much time to be executed. So I tried with Neography's batch, but I ran into some issues.

    p = Neography::Rest.new.batch [:create_node, {"name" => "#{id}"}]

gave me a "undefined method `split' for nil:NilClass" in

id["self"].split('/').last

    d=Neography::Rest.new.batch [:get_node, d]

gives me a Neography::UnknownBatchOptionException for get_node

I am not even sure this will save me enough time either.

I also tried different ways to do this, using Batch Import for example, but I couldn't find a way to get the already created node I need from the db. As you can see I'm kinda new to this so any help will be appreciated. Thanks in advance.


Solution

  • You might be able to do this with pure cypher (or neography generated cypher). Something like this perhaps:

    MATCH (n:Label) WHERE n.name={id}
    WITH n
    CREATE (p:LabelSample {name: n.name})-[:belongs_to]->n
    

    Not that I'm using CREATE, but if you don't want to create duplicate LabelSample nodes you could do:

    MATCH (n:Label) WHERE n.name={id}
    WITH n
    MERGE (p:LabelSample {name: n.name})
    CREATE p-[:belongs_to]->n
    

    Note that I'm using params, which are generally recommended for performance (though this is just one query, so it's not as big of a deal)