Search code examples
prefect

Prefect docker agent failing to update flow status after execution


Summary

I am trying to understand Prefect docker agents. To do so, I am trying to configure a minimal set up on my local machine. I have managed to get the docker agent to connect to the local server and it looks like it's running the flow. However, it seems that, after the flow is finished, the agent is unable to update the flow's state at the server because it cannot connect to the server back end.

Detail

Here is my flow:

import prefect
from prefect import task, Flow
from prefect.run_configs import DockerRun

@task
def say_hello():
    logger = prefect.context.get("logger")
    logger.info("Hello, docker!")

with Flow("docker-hello-flow") as flow:
    flow.run_config = DockerRun()
    say_hello()

# Register the flow under the "tutorial" project
flow.register(project_name="tutorial")

I configure back end to use a local core server:

prefect backend server

I then start the server:

prefect server start -d

I connect to the server UI on localhost:8080 and confirm that it's running.

In the UI, I create the project tutorial.

I then register the flow:

:; python src/hello_docker.py 
Flow URL: http://localhost:8080/default/flow/fea8211e-c243-40c8-a01e-f63ab2afcc77
 └── ID: 0a7a6cc4-1e7b-4e71-a900-90dffb4362a9
 └── Project: tutorial
 └── Labels: ['parami']

I then confirm that the flow appears in the UI as expected. Note that my machine name is parami hence the label parami.

I then start a local docker agent specifying the label parami.

:; prefect agent docker start -l parami --log-level DEBUG --show-flow-logs

I then run the flow via the UI. The flow run is called enigmatic-axolotl.

The docker agent's log is as follows:

[2021-11-16 22:47:17,076] DEBUG - agent | No ready flow runs found.
[2021-11-16 22:47:17,076] DEBUG - agent | Sleeping flow run poller for 2.0 seconds...
[2021-11-16 22:47:18,506] DEBUG - agent | {'status': 'Pulling from prefecthq/prefect', 'id': '0.15.7'}
[2021-11-16 22:47:18,509] DEBUG - agent | {'status': 'Digest: sha256:e3f6dece4c8d5d7b289cb6e017d3a3b0617d3084df57c2bb96999a2b3c2470f0'}
[2021-11-16 22:47:18,510] DEBUG - agent | {'status': 'Status: Image is up to date for prefecthq/prefect:0.15.7'}
[2021-11-16 22:47:18,513] INFO - agent | Successfully pulled image prefecthq/prefect:0.15.7
[2021-11-16 22:47:18,513] DEBUG - agent | Creating Docker container prefecthq/prefect:0.15.7
[2021-11-16 22:47:18,578] DEBUG - agent | Starting Docker container with ID 68fea87be44f0332568b2a01e391c5e1822f58f43438df9bb81e228a8edc9625 and name 'enigmatic-axolotl'
[2021-11-16 22:47:18,976] DEBUG - agent | Docker container 68fea87be44f0332568b2a01e391c5e1822f58f43438df9bb81e228a8edc9625 started
[2021-11-16 22:47:18,977] INFO - agent | Completed deployment of flow run b1df1042-96a7-4a09-aee0-820468eccf87
[2021-11-16 22:47:19,076] DEBUG - agent | Querying for ready flow runs...
[2021-11-16 22:47:19,101] DEBUG - agent | No ready flow runs found.
[2021-11-16 22:47:19,101] DEBUG - agent | Sleeping flow run poller for 4.0 seconds...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 175, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 96, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.7/http/client.py", line 1281, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1327, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1276, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 976, in send
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 187, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f2e7172e690>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 756, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2e7172e690>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/prefect", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 53, in flow_run
    result = client.graphql(query)
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 554, in graphql
    retry_on_api_error=retry_on_api_error,
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 458, in post
    retry_on_api_error=retry_on_api_error,
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 738, in _request
    session=session, method=method, url=url, params=params, headers=headers
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 606, in _send_request
    timeout=prefect.context.config.cloud.request_timeout,
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2e7172e690>: Failed to establish a new connection: [Errno 111] Connection refused'))
[2021-11-16 22:47:23,101] DEBUG - agent | Querying for ready flow runs...
[2021-11-16 22:47:23,181] DEBUG - agent | No ready flow runs found.
[2021-11-16 22:47:23,181] DEBUG - agent | Sleeping flow run poller for 8.0 seconds...

So, the agent is successfully getting the flow run enigmatic-axolotl from the server and appears to complete the execution. My understanding is that it is then attempting to connect to the server to update the flow run's status. However, it is failing to do so because it is failing to connect to host.docker.internal:4200.

I wondered if host.docker.internal is a valid host so I restarted the agent with the option -a http://localhost:4200. The agent successfully connects to the server at localhost:4200 (it's reported in the log as doing so) but, when I run the flow again, I get the same error as before; that is, it is failing to connect to host.docker.internal:4200.

Finally, I reran the agent with -a http://0.0.0.0:4200. Again, it successfully connects to the server. I then rerun the flow and it fails yet again. However, this time it is trying to connect to 0.0.0.0:4200:

    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=4200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f70738948d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

What am I missing? I assume that there's some configuration that I need to set to make this work.


Solution

  • From Prefect Server 0.15.5 and up, you might need to do

    prefect server start --expose. This will allow outside connections to server.

    You can find more information in this Github issue: https://github.com/PrefectHQ/prefect/issues/4963