Search code examples
pythonneo4jpy2neoneomodel

"Random" SocketError/Connection Refused errors on py2neo queries


Hullo, hope this doesn't end up being too trivial.

The relevant parts of my stack are Gunicorn/Celery, neomodel (0.3.6), and py2neo (1.5). Neo4j version is 1.9.4, bound on 0.0.0.0:7474 (all of this is on linux, Ubuntu 13.04 I think)

So my gunicorn/celery servers are fine most of the time, except occasionally, I get the following error:

ConnectionRefusedError(111, 'Connection refused')

Stacktrace (most recent call last):
  File "flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "flask/_compat.py", line 33, in reraise
    raise value
  File "flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "Noomsa/web/core/util.py", line 156, in inner
    user = UserMixin().get_logged_in()
  File "Noomsa/web/core/util.py", line 117, in get_logged_in
    user = models.User.index.get(username=flask.session["user"])
  File "neomodel/index.py", line 50, in get
    nodes = self.search(query=query, **kwargs)
  File "neomodel/index.py", line 41, in search
    return [self.node_class.inflate(n) for n in self._execute(str(query))]
  File "neomodel/index.py", line 28, in _execute
    return self.__index__.query(query)
  File "py2neo/neo4j.py", line 2044, in query
    self.__uri__, quote(query, "")
  File "py2neo/rest.py", line 430, in _send
    raise SocketError(err)

So, as you can see, I do a call to User.index.get (The first call in the request response), and get a socket error. Sometimes. Most of the time, it connects fine. The error occurs amongst all Flask views/Celery tasks that use the neo4j connection (and not just doing User.index.get ;)).

So far, the steps I've taken have involved moneky patching the neomodel connection function to check that the GraphDatabaseService object is created per thread, and to automatically reconnect (and authenticate) to the neo4j server every 30 or so seconds. This may have reduced the frequency of the errors, but they still occur.

Looking for the error online, it seems to be mostly people trying to connect to the wrong interface/ip/port. However, given that the majority of my requests go through, I don't feel like that is the case here.

Any ideas? I don't think it's related, but my database seems to have 38k orphaned nodes; that's probably worthy of another question in its own right.

EDIT: I should add, this seems to disappear when running gunicorn/celery with workers=1, instead of workers=$CPU_N. Can't see why it should matter, as apparently neo4j is set up to handle $N_CPU*10 connections by default.


Solution

  • This looks like a networking or web stack configuration problem so I don't think I can help from a py2neo perspective. I'd recommend upgrading to py2neo 1.6 though as the client HTTP code has been completely rewritten and it might handle a reconnection more gracefully.