Search code examples
pythonsslcluster-computingdaskdask-distributed

Dask: How to Add Security (TLS/SSL) to Dask Cluster?


I'm trying to figure out how to add a security layer to my Dask Cluster deployed using helm on GKE on GCP, that would force a user to input the certificate and key files into the Security Object, as explained in this documentation [1]. Unfortunately, I get a timeout error from the scheduler pod crashing. Upon investigating the logs, the error is as follows:

Traceback (most recent call last):
  File "/opt/conda/bin/dask-scheduler", line 10, in <module>
    sys.exit(go())
  File "/opt/conda/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 226, in go
    main()
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 206, in main
    **kwargs
  File "/opt/conda/lib/python3.7/site-packages/distributed/scheduler.py", line 1143, in __init__
    self.connection_args = self.security.get_connection_args("scheduler")
  File "/opt/conda/lib/python3.7/site-packages/distributed/security.py", line 224, in get_connection_args
    "ssl_context": self._get_tls_context(tls, ssl.Purpose.SERVER_AUTH),
  File "/opt/conda/lib/python3.7/site-packages/distributed/security.py", line 187, in _get_tls_context
    ctx = ssl.create_default_context(purpose=purpose, cafile=ca)
  File "/opt/conda/lib/python3.7/ssl.py", line 584, in create_default_context
    context.load_verify_locations(cafile, capath, cadata)
FileNotFoundError: [Errno 2] No such file or directory

Helm Config Yaml File is as follows:

scheduler:
  allowed-failures: 5
  env:
    - name: DASK_DISTRIBUTED__COMM__DEFAULT_SCHEME
      value: "tls"
    - name: DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION
      value: "true"
    - name: DASK_DISTRIBUTED__COMM__TLS__CA_FILE
      value: "myca.pem"
    - name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY
      value: "mykey.pem"
    - name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT
      value: "myca.pem"
    - name: DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY
      value: "mykey.pem"
    - name: DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT
      value: "myca.pem"
    - name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY
      value: "mykey.pem"
    - name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT
      value: "myca.pem"

I create the key and certificates files as follows:

openssl req -newkey rsa:4096 -nodes -sha256 -x509 -days 3650 -nodes -out myca.pem -keyout mykey.pem

Here is a minimal complete verifiable example:

import dask.dataframe as dd
from dask.distributed import Client
from distributed.security import Security

sec = Security(tls_ca_file='myca.pem',
               tls_client_cert='myca.pem',
               tls_client_key='mykey.pem',
               require_encryption=True)

with Client("tls://<scheduler_ip>:8786", security=sec) as dask_client:
    ddf = dd.read_csv('gs://<bucket_name>/my_file.csv',
                      engine='python',
                      error_bad_lines=False,
                      encoding="utf-8",
                      assume_missing=True
                      )

    print(ddf.shape[0].compute())

[1] https://distributed.dask.org/en/latest/tls.html


Solution

  • I resolved the issue. Both the Dask workers and the scheduler need to have the certificate files in the config. Additionally, we need to bake in the certificates in the dockerfile as well. See full config below:

    Dockerfile

    FROM daskdev/dask
    
    RUN conda install --yes \
        -c conda-forge \
        python==3.7
    
    ADD certs /certs/
    
    ENTRYPOINT ["tini", "-g", "--", "/usr/bin/prepare.sh"]
    

    Helm Config

    worker:
      name: worker
      image:
        repository: "gcr.io/PROJECT_ID/mydask"
        tag: "latest"
      env:
        - name: DASK_DISTRIBUTED__COMM__DEFAULT_SCHEME
          value: "tls"
        - name: DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION
          value: "true"
        - name: DASK_DISTRIBUTED__COMM__TLS__CA_FILE
          value: "certs/myca.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY
          value: "certs/mykey.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT
          value: "certs/myca.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY
          value: "certs/mykey.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT
          value: "certs/myca.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY
          value: "certs/mykey.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT
          value: "certs/myca.pem"
    
    scheduler:
      name: scheduler
      image:
        repository: "gcr.io/PROJECT_ID/mydask"
        tag: "latest"
      env:
        - name: DASK_DISTRIBUTED__COMM__DEFAULT_SCHEME
          value: "tls"
        - name: DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION
          value: "true"
        - name: DASK_DISTRIBUTED__COMM__TLS__CA_FILE
          value: "certs/myca.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY
          value: "certs/mykey.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT
          value: "certs/myca.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY
          value: "certs/mykey.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT
          value: "certs/myca.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY
          value: "certs/mykey.pem"
        - name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT
          value: "certs/myca.pem"