I have a Docker Swarm with three services and one node: routing_api (1 container) service_api (1 container) other_service_api (1 container)
routing_api is a container that sets up supervisord to run nginx and uwsgi associated with a Flask application (Python). This application receives requests at specified endpoints, then makes a request to the container other_service_api. Additionally, supervisord runs another script, which runs a Google Cloud pub/sub subscriber. When this subscriber pulls a message, a request is sent to service_api.
Everything works well for routing_api <> service_api.
However, for routing_api <> other_service_api, requests fail most of the time. Sometimes, the request will be sent successfully, sometimes it won't. Requests are sent like this:
import requests
requests.post('http://service_api:80/api')
The traceback is as follows:
HTTPConnectionPool(host='service_api', port=80): Max retries exceeded with url: /api(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd1e99e8730>: Failed to establish a new connection: [Errno -2] Name does not resolve'))
If I SSH onto my host, then run docker exec -it routing_api_container_id "/bin/sh"
, then run:
import requests, time
while True:
requests.post('http://service_api:80/api')
time.sleep(1)
I only receive responses with the 200 status code.
The services are deployed via docker stack deploy and docker-compose files.
routing compose file
version: '3.4'
networks:
service:
other_service:
services:
api:
image: <image_name>
networks:
- service
service compose file:
version: '3.4'
networks:
routing_service:
external: true
services:
api:
image: <image_name>
networks:
- routing_service
This has never happened before and I am at a loss as to what could be the source. The bug appears difficult to reproduce. Has anyone encountered this before? Note that the way things are setup is exactly the same for other_service, the only difference is that the requests are made from another Python executable.
This was due to a simple configuration / environment mistake, nothing to do with Swarm.