Search code examples
apache-sparkibm-cloudmessage-hubdata-science-experiencedsx

How to connect to Message Hub from Data Science Experience / Spark as a Service using confluent-kafka-python?


The Bluemix MessageHub docs point python users to the confluent kafka library:

enter image description here

So I tried to install:

!pip install --user confluent-kafka

However, I've hit this error:

Collecting confluent-kafka
  Using cached confluent-kafka-0.9.1.2.tar.gz
Installing collected packages: confluent-kafka
  Running setup.py install for confluent-kafka ... - \ error
    Complete output from command /usr/local/src/bluemix_jupyter_bundle.v22/notebook/bin/python -u -c "import setuptools, tokenize;__file__='/gpfs/global_fs01/sym_shared/YPProdSpark/user/xxxx/notebook/tmp/pip-build-N3zDUh/confluent-kafka/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /gpfs/fs01/user/xxxx/notebook/tmp/pip-PyAwq2-record/install-record.txt --single-version-externally-managed --compile --user --prefix=:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/confluent_kafka
    copying confluent_kafka/__init__.py -> build/lib.linux-x86_64-2.7/confluent_kafka
    creating build/lib.linux-x86_64-2.7/confluent_kafka/kafkatest
    copying confluent_kafka/kafkatest/__init__.py -> build/lib.linux-x86_64-2.7/confluent_kafka/kafkatest
    copying confluent_kafka/kafkatest/verifiable_consumer.py -> build/lib.linux-x86_64-2.7/confluent_kafka/kafkatest
    copying confluent_kafka/kafkatest/verifiable_producer.py -> build/lib.linux-x86_64-2.7/confluent_kafka/kafkatest
    copying confluent_kafka/kafkatest/verifiable_client.py -> build/lib.linux-x86_64-2.7/confluent_kafka/kafkatest
    running build_ext
    building 'confluent_kafka.cimpl' extension
    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/confluent_kafka
    creating build/temp.linux-x86_64-2.7/confluent_kafka/src
    gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/src/bluemix_jupyter_bundle.v22/notebook/include/python2.7 -c confluent_kafka/src/confluent_kafka.c -o build/temp.linux-x86_64-2.7/confluent_kafka/src/confluent_kafka.o
    In file included from confluent_kafka/src/confluent_kafka.c:17:0:
    confluent_kafka/src/confluent_kafka.h:20:32: fatal error: librdkafka/rdkafka.h: No such file or directory
     #include <librdkafka/rdkafka.h>
                                    ^
    compilation terminated.
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
Command "/usr/local/src/bluemix_jupyter_bundle.v22/notebook/bin/python -u -c "import setuptools, tokenize;__file__='/gpfs/global_fs01/sym_shared/YPProdSpark/user/xxxx/notebook/tmp/pip-build-N3zDUh/confluent-kafka/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /gpfs/fs01/user/xxxx/notebook/tmp/pip-PyAwq2-record/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/xxxx/notebook/tmp/pip-build-N3zDUh/confluent-kafka/

Solution

  • As per the confluent readme, you need the librdkafka native library installed before you can use their Python client.

    As you found in your other comment their is the alternative pure Python library kafka-python which you might find easier to work with for a Bluemix application