Search code examples
rubycsvneo4jneography

How to save neo4j database?


I'm using neo4j for the first time, neography for Ruby. I have my data in csv files. I can successfully populate the database through my main file, i.e. create all nodes. So, for each csv file (here, user.csv), I'm doing -

def create_person(name, id)
  Neography::Node.create("name" => name, "id" => id)
end

CSV.foreach('user.csv', :headers => true) do |row|
  id = row[0].to_i()
  name = row[1]
  $persons[id] = create_person(name, id)
end

Likewise for other files. There are two issues now. Firstly, if my files are very small, then it goes fine, but when files are slightly big, I get (I'm dealing with 4 1MB files) -

SocketError: Too many open files (http://localhost:7474)

Another issue is that I don't want to do this (populate db) every time I run this ruby file. I want to populate the data once and then don't want to touch the database. After that I only want to run queries on it. Can anyone please tell me how to populate it and save it? And then how can I load it whenever I want to use it. Thank you.


Solution

  • Create a @neo client:

      @neo = Neography::Rest.new
    

    Create a queue:

      @queue = []
    

    Make use of the BATCH api for data loading.

    def create_person(name, id)
      @queue << [:create_node, {"name" => name, "id" => id}]
      if @queue.size >= 500
        batch_results = neo.batch *@queue
        @queue = []
        batch_results.each do |result|
          id = result["body"]["self"].split('/').last
          $persons[id] = result
        end
      end
    end
    

    Run through you csv file:

    CSV.foreach('user.csv', :headers => true) do |row|
      create_person(row[1], row[0].to_i)
    end
    

    Get the leftovers:

        batch_results = @neo.batch *@queue
        batch_results.each do |result|
          id = result["body"]["self"].split('/').last
          $persons[id] = result
        end
    

    An example of data loading via the rest api can be seen here => https://github.com/maxdemarzi/neo_crunch/blob/master/neo_crunch.rb

    An example of using a queue for writes can be seen here => http://maxdemarzi.com/2013/09/05/scaling-writes/