Search code examples
ruby-on-railsherokuneo4jpumaneo4j.rb

Reconnect Neo4jrb connections after puma fork with preload_app?


Heroku suggests using it, but these problems I see all in my local dev environment

  • I am using MRI 2.2.3 and neo4j.rb 6.0.0 (ActiveNode models) with the HTTP adapter
  • I am also on OSX 10.11.2 with Neo4j 2.2.5 and jdk1.7.0_51-b53

I've tried lots of things to reopen connections after fork, this was the closest I could figure out:

cfg = Rails.application.config.neo4j
Neo4j::Session.set_current(Neo4j::Session.open(cfg.session_type, cfg.session_path, cfg.session_options))

BUT:

  • When I do reopen connections like this I keep having problems as though Faraday is returning a request's results twice
    • Basically neo4j.rb calls the connection URL (http://localhost:7474/) to get the data URL, then calls the data URL. But the connection URL for other workers sometimes returns the response from the data URL, which then blows up later when it tries to actually query Neo4j.
  • When I do not reopen connections, things seem to work although I haven't simulated enough load to know for sure

I'm suspecting the neo4j-core session is the limit: neo4j-core on github and that I should just forget about reopening the session after fork since the gem only has one shared session.

But I'm not 100% confident in this and massive googling gave me nothing. Can anyone confirm that basically there is no need to reopen the connection after forking?

I'd also love to know if:

  1. There is some kind of connection pooling happening somewhere else
  2. I need to configure it somehow
  3. The gem itself needs to have pooling as a feature/pull request
  4. I should just be using embedded mode if performance is an issue

I'm not at a scale that performance is an issue, but I want to make sure I know what will be needed when I am. Thanks.

Other config:

  • This is on a MacBook pro 4-core i7 = 8 processors
  • I am using foreman to launch 2 puma servers, one to port 80 and one on port 443
    • yes I know I should proxy via nginx :-) This is just a quick setup for local dev
    • each server has 2 workers and 5 threads = 10 threads (ie threads > # processors)

Solution

  • I have to admin I didn't know much about connection pools until I read up on them yesterday. I think I have a better handle on the concept now and I've been giving some thought about how they might apply to the gem.

    Currently you get the session via Neo4j::Session.current. The code for that method is just this:

      def current
        @@current_session
      end
    

    So basically it uses a class variable. I'm pretty sure that's not thread safe ;) I think we should be doing something like Thread.current[:neo4j_curr_session] instead so that there is a session for each thread. The class variable hasn't caused any problems so far, but maybe it's cause problems and I haven't recognized them yet.

    Regarding connection pools: The typical use case, I think, is something like puma where you have a thread for each web server worker. If each one of those had a session then you probably would generally not have many sessions open. That said, I can understand how somebody may want:

    • To just run a Ruby script which would have lots of threads, but wouldn't want a session for each thread
    • To have lots of web worker threads because they have that much load and don't want a session for each thread

    I think that I'm thinking this through correctly, but definitely let me know if not. I realize it's note quite an "answer", but as one of the maintainers I'm answering your question about what the state of the project is currently ;)

    If you'd like to talk more, please join us on Gitter