Search code examples
dockerconfluent-platformksqldb

KSQLDB Cluster Failure when on Multiple Bare Metal machines running docker


I am getting a failure trying to join a KSQLDB cluster and serve requests. I made an image that explains the issue better than I can write it. Box titled "Cluster Fails" is my issue.

Funny part is that it definitely attempts to cluster because I get {"@type":"statement_error","error_code":40001,"message":"Unable to execute pull query: when I make a call to 192.168.150.125:8087

enter image description here

@Robin Moffatt So for version of KDQLDB it is the latest the docker image used is

image: confluentinc/ksqldb-server

As for log on 192.168.150.125 docker image I look at

[email protected]:~/docker/images/ksqldb# docker logs 0ea930c887f8 ===> Configuring ksqlDB... ===> Launching ksqlDB Server... OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in >version 9.0 and will likely be removed in a future release. [2020-06-22 14:38:07,797] INFO KsqlConfig values: ksql.access.validator.enable = auto ksql.any.key.name.enabled = false ksql.authorization.cache.expiry.time.secs = 30 ksql.authorization.cache.max.entries = 10000 ksql.connect.url = http://localhost:8083 ksql.connect.worker.config = ksql.extension.dir = ext ksql.hidden.topics = [_confluent.*, __confl.........

No error shows at all in the docker logs [imageid] logfile. I have gone thru it from start to finish and even while attempting the query but nothing in the log telling something like "I CANNOT CONNECT OR JOIN THE CLUSTER" or even "I HAVE TRIED TO JOIN THE CLUSTER". I would have thought since this container is trying to joining a cluster there would be some kind of logging about it but nothing.

The error I get is when attempting the same query on all three servers. The first two servers run perfect as shown below but the container on different machine gets the "Unable to execute pull query:" error.

[email protected]:~/docker/images/ksqldb# curl -X "POST" "http://192.168.150.124:8085/query" -H "Content-Type: application/vnd.ksql.v1+json; charset=utf-8" -d $'{ "ksql": "SELECT * FROM chart_usage_table WHERE ROWKEY='73bd8a1d-5a9e-4343-b0e3-878ab5c37529|+|zn|+|09066437|+|2020-05-03';", "streamsProperties": {} }' [{"header":{"queryId":"query_1592830554884","schema":"ROWKEY STRING KEY, USERID STRING, ENTITYTYPE STRING, ENTITYID STRING, DATESEARCHED STRING, COUNT BIGINT"}}, {"row":{"columns":["73bd8a1d-5a9e-4343-b0e3-878ab5c37529|+|zn|+|09066437|+|2020-05-03","73bd8a1d-5a9e-4343-b0e3-878ab5c37529","pzn","09066437","2020-05-03",100]}}]

[email protected]:~/docker/images/ksqldb# curl -X "POST" "http://192.168.150.124:8086/query" -H "Content-Type: application/vnd.ksql.v1+json; charset=utf-8" -d $'{ "ksql": "SELECT * FROM chart_usage_table WHERE ROWKEY='73bd8a1d-5a9e-4343-b0e3-878ab5c37529|+|zn|+|09066437|+|2020-05-03';", "streamsProperties": {} }' [{"header":{"queryId":"query_1592830563312","schema":"ROWKEY STRING KEY, USERID STRING, ENTITYTYPE STRING, ENTITYID STRING, DATESEARCHED STRING, COUNT BIGINT"}}, {"row":{"columns":["73bd8a1d-5a9e-4343-b0e3-878ab5c37529|+|zn|+|09066437|+|2020-05-03","73bd8a1d-5a9e-4343-b0e3-878ab5c37529","pzn","09066437","2020-05-03",100]}}]

[email protected]:~/docker/images/ksqldb# curl -X "POST" "http://192.168.150.125:8087/query" -H "Content-Type: application/vnd.ksql.v1+json; charset=utf-8" -d $'{ "ksql": "SELECT * FROM chart_usage_table WHERE ROWKEY='73bd8a1d-5a9e-4343-b0e3-878ab5c37529|+|zn|+|09066437|+|2020-05-03';", "streamsProperties": {} }' {"@type":"statement_error","error_code":40001,"message":"Unable to execute pull query: SELECT * FROM chart_usage_table WHERE ROWKEY='73bd8a1d-5a9e-4343-b0e3-878ab5c37529|+|zn|+|09066437|+|2020-05-03';","stackTrace":[],"statementText":"SELECT * FROM chart_usage_table WHERE ROWKEY='73bd8a1d-5a9e-4343-b0e3-878ab5c37529|+|zn|+|09066437|+|2020-05-03';","entities":[]}

@Andrew Coates,

Thanks for the clarification but I am even more confused since the documentation speaks of Joining Clusters, but that is not important. The issue here is that even if I try your solution it does not work. I keep getting a binding exception which seems correct to me since a Container cannot bind anything for its Host. I am also not a Docker expert but my knuckles are bloody enough to say that the issue seems to be KSQLDB not being able to resolve something but no indication as to what it is.

ports:
    - "8087:8088"

extra_hosts:
    - "ACCL-FFM-SRV-125:192.168.150.125"
    - "ACCL-FFM-SRV-124:192.168.150.124"

environment:
    KSQL_LISTENERS: http://ACCL-FFM-SRV-125:8088

Caused by: java.net.BindException: Cannot assign requested address at java.base/sun.nio.ch.Net.bind0(Native Method)

Here another thing though, I call a netcat server on the other machines container with no problem. So i know that network resolving is working fine from container on one machine to another container on a different machine.

[root@ksqldbservermvcu1 ~]# curl -X "POST" "http://ACCL-FFM-SRV-125:8086/query" 
-d $'NETCAT on container will receive this.'

and in the container netcat is setup to receive.

[root@ksqldbservermvcu4 ~]# nc -l -p 8086  
POST /query HTTP/1.1  
Host: ACCL-FFM-SRV-124:8086  
User-Agent: curl/7.61.1 Accept: */*
Content-Length: 38  
Content-Type: application/x-www-form-urlencoded

NETCAT on container will receive this.

By the way trying to use KSQL_ADVERTISED_LISTENER: http://192.168.150.125:8088 does not get respected in any way. Notice the 's'

root@ksqldbservermvcu4:~/docker/images/ksqldb# docker logs a746f993b9d9 | grep advertised
    ksql.advertised.listener = null
[2020-07-02 10:44:52,127] WARN The configuration 'advertised.listener' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)

root@ksqldbservermvcu4:~/docker/images/ksqldb# docker logs 61fa63d920a3 | grep advertised
    ksql.advertised.listener = null
[2020-07-02 11:12:18,298] WARN The configuration 'advertised.listeners' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)

Maybe I am missing something, please help.


Solution

  • Well I can confirm that I finally got this working with the configuration shown below in my docker-compose file. But another thing I had to do was reboot the machine (this machine was not rebooted for almost a year). Also I had to delete all the topics in the Kafka Cluster. All this combined cleared the way to get this working. Hope it helps someone.

    ksqldb-server-mvcu1:
        image: confluentinc/ksqldb-server
        hostname: ksqldbservermvcu1
        container_name: ksqldbservermvcu1
        volumes:
            - /root/docker/images/ksqldb/docker_mapped_folders/ksqldb:/tmp/ksqldb
        ports:
            - "8087:8088"
            - "31095:31099"
        extra_hosts:
            - "ACCL-FFM-SRV-125:192.168.150.125"
            - "ACCL-FFM-SRV-124:192.168.150.124"
        environment:
            KSQL_LISTENERS: http://0.0.0.0:8088
            KSQL_BOOTSTRAP_SERVERS: 192.168.150.156:9092,192.168.150.145:9092,192.168.150.160:9092
            KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true"
            KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true"
            KSQL_KSQL_SERVICE_ID: "ksqldb_msg_shown_cluster"
            KSQL_KSQL_ADVERTISED_LISTENER: http://192.168.150.125:8087 
    

    BTW: take a look at this link from Robin Moffatt for a great listener explanation https://www.confluent.io/blog/kafka-listeners-explained/