Search code examples
dockerneo4jfastapitraefik

Neo4j Docker Compose with Python FastAPI, Traefic on https, "Connection refused"


I'm using a Docker compose with multiple containers (a custom version of Full Stack FastAPI, but with Neo4j included).

Full docker-compose.yml here and an excerpt for neo4j:

  neo4j:
    image: neo4j
    networks:
      - ${TRAEFIK_PUBLIC_NETWORK?Variable not set}
      - default
    ports:
      - "6477:6477"
      - "7474:7474"
      - "7687:7687"
    volumes:
      - app-neo4j-data:/data
      - app-neo4j-plugins:/plugins
    env_file:
      - .env
    environment:
      - NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
      - NEO4J_AUTH=${NEO4J_USERNAME?Variable not set}/${NEO4J_PASSWORD?Variable not set}
      - NEO4J_dbms_default__advertised__address=0.0.0.0
      - NEO4J_dbms_default__listen__address=0.0.0.0
      - NEO4J_dbms_connector_bolt_advertised__address=0.0.0.0:7687
      - NEO4J_dbms_connector_bolt_listen__address=0.0.0.0:7687
      - NEO4J_dbms_connector_http_listen__address=:7474
      - NEO4J_dbms_connector_http_advertised__address=:7474
      - NEO4J_dbms_connector_https_listen__address=:6477
      - NEO4J_dbms_connector_https_advertised__address=:6477
      - NEO4J_dbms_connector_bolt_listen__address=:7687
      - NEO4J_dbms_mode=SINGLE
    deploy:
      labels:
        - traefik.enable=true
        - traefik.docker.network=${TRAEFIK_PUBLIC_NETWORK?Variable not set}
        - traefik.constraint-label=${TRAEFIK_PUBLIC_TAG?Variable not set}
        - traefik.http.routers.${STACK_NAME?Variable not set}-neo4j-http.rule=Host(`neo4j.${DOMAIN?Variable not set}`)
        - traefik.http.routers.${STACK_NAME?Variable not set}-neo4j-http.entrypoints=http
        - traefik.http.routers.${STACK_NAME?Variable not set}-neo4j-http.middlewares=${STACK_NAME?Variable not set}-https-redirect
        - traefik.http.routers.${STACK_NAME?Variable not set}-neo4j-https.rule=Host(`neo4j.${DOMAIN?Variable not set}`)
        - traefik.http.routers.${STACK_NAME?Variable not set}-neo4j-https.entrypoints=https
        - traefik.http.routers.${STACK_NAME?Variable not set}-neo4j-https.tls=true
        - traefik.http.routers.${STACK_NAME?Variable not set}-neo4j-https.tls.certresolver=le
        - traefik.http.services.${STACK_NAME?Variable not set}-neo4j.loadbalancer.server.port=7474
      replicas: 1
      resources:
        limits:
          memory: 1024M
        reservations:
          memory: 500M
      restart_policy:
        condition: on-failure

I try to reach bolt from the backend with bolt://login:password@neo4j:7687 but get the following error:

neo4j.exceptions.ServiceUnavailable: Couldn't connect to neo4j:7687 (resolved to ('10.0.3.11:7687',)):
Failed to establish connection to ResolvedIPv4Address(('10.0.3.11', 7687)) (reason [Errno 111] Connection refused)

I have reviewed an extraordinary number of responses on Stackoverflow, but not getting anywhere. This does work on dev, but I haven't implemented https there, so I'm not sure if that's what's causing the problem.

I'm at a loss and would appreciate any guidance.


Solution

  • This particular issue wasn't related to Docker or Neo4j directly. Rather it is that it takes finite time to initialise the database. The solution is to retry the connection. Here's the way I went about it:

    from tenacity import after_log, before_log, retry, stop_after_attempt, wait_fixed
    
    max_tries = 60 * 5  # 5 minutes
    wait_seconds = 1
    
    @retry(
        stop=stop_after_attempt(max_tries),
        wait=wait_fixed(wait_seconds),
        before=before_log(logger, logging.INFO),
        after=after_log(logger, logging.WARN),
    )
    def initNeo4j() -> None:
        try:
            init_gdb()
        except Exception as e:
            raise e
    

    Where init_gdb() is the script initialising Neo4j.

    The issue arises after Docker completes instantiation of the containers and the main backend server begins to initialise and build itself. When the admin user is created, the database must be up and running, if it isn't, you get this error.

    Given the same error is used to cover a multitude of issues, it isn't necessarily clear that the database isn't ready, rather than that some setting to reach it is incorrect.