Search code examples
python-3.xapache-kafkaopensslconfluent-platform

Can't establish SSL connection to Kafka after upgrading to python 3.7


Code I have that successfully connects to Kafka with an SSL connection in Python 3.6.7 fails when using Python 3.7.3, with error message SSL: WRONG_VERSION_NUMBER. I would not expect code working in Python 3.6 to fail when in Python 3.7. I would like to know how to resolve this error and connect to Kafka via SSL with Python 3.7.3.

I have tried several things to troubleshoot:

  • Use a different package to connect to Kafka (connection with faust produces basically the same error)
  • Use a different cipher suite (and verify that setting them incompatibly changes the error to no cipher suites in common)
  • Use a different version of the Kafka container (Confluent's Kafka 5.0.0 container with python 3.7 returns a no cipher suites in common error, 5.1.3 does not change the error)
  • Use a different protocol (enabling only TLS1.1 or 1.2 does not change the error, enabling 1.1 on Kafka and 1.2 on python or vice-versa causes a failure in name resoltion)
  • Use a different version of openssl (this error was originally found using openssl 1.1.1c, and reproduced on 1.1.1a; for the current reproduction both containers use 1.1.1b)

Reproducing this problem could be fairly involved. It requires running Kafka and Zookeeper alongside two different, comparable version of Python, and the complete set of SSL credentials that each of those require. Thankfully, Docker can take care of much of this for us. I have created a Github Repo that contains a minimal set of files required to reproduce the error using only Docker desktop:

https://github.com/r-archer37/python-kafka-mre

The exact steps to reproduce the error are in the README. The short version is that there are two docker-compose files where the only difference is the version of the Jupyter-provided python-based docker image. Each runs a simple script that installs pykafka and then attempts to connect to the Kafka container. The container with python 3.6 will successfully connect to Kafka (console output looks like DEBUG:pykafka.connection:Successfully connected to b'kafka':9092) and the container with python 3.7 will fail to connect to Kafka (console output looks like INFO:pykafka.connection:Attempt 0: failed to connect to kafka:9092 ... INFO:pykafka.connection:[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1056)).

Fixes, and suggestions of things to try, are both welcome!

Edit: The solution appears to be to use a kafka docker image by a different organization, not Confluent.


Solution

  • This is quite weird. Based on my investigation, I suspect the python upgrade is bringing to light an issue with Kafka, you might want to file a bug report with them. I was able to reproduce it working with the python container 3.6 and the failure with the 3.7.

    I captured wireshark traces of both. With 3.6, the client sends the Client Hello tls message, and the server responds with a valid Server Hello, completing the handshake. With 3.7, when the client sends the Client Hello message, the server responds with 0x00 repeated. 0x00 0x00 is not a valid TLS version, hence the WRONG_VERSION_NUMBER that openssl reports.

    When trying to create a TLS connection to kafka using the openssl client from either container, the server is also responding to the client handshake with just a series of 0x00 bytes. Openssl client command I used: openssl s_client -connect kafka:9092 -cert mre.pem -CAfile mre.pem -key mre.pem -state -debug -tls1_2