Search code examples
kubernetesapache-kafkaapache-kafka-connectazure-eventhubstrimzi

Strimzi Mirrormaker2 and Event Hub Odd Error, Can't Pinpoint


I am trying to sync Kafka to Azure Event Hub, one way. I followed every tutorial I could find to no avail. Nothing seems to work as I keep getting obscure errors. Below is the config used for deployment. We even used the RootManageSharedAccessKey to make sure nothing blocks. I have Kafka, KafkaConnect deployed without issue.

MirrorMaker2 works between Kafka and Kafka, but no dice when I try to sync with EventHub.

To test if port 9093 was accessible, I successfully used telnet to access it.

MM2 Config

cat <<EOF | kubectl apply -n kafka-cloud -f -
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
  name: mirror-maker-eventhub
spec:
  version: 3.1.0
  replicas: 1
  connectCluster: "eventhub"
  clusters:
  - alias: "my-kafka-cluster"
    bootstrapServers: my-kafka-cluster-kafka-bootstrap:9092
  - alias: "eventhub"
    bootstrapServers: XXXXXXXXXXXXXXXXXXXX.servicebus.windows.net:9093
    config:
      config.storage.replication.factor: 1
      offset.storage.replication.factor: 1
      status.storage.replication.factor: 1
      producer.connections.max.idle.ms: 180000
      producer.metadata.max.age.ms: 180000
      security.protocol: SASL_SSL
      sasl.mechanism: PLAIN
      sasl.jaas.config: org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://XXXXXXXXXXXXXXXXXXXX.servicebus.windows.net/;SharedAccessKeyName=RootKeyAccess;SharedAccessKey=XXXXXXXXXXXXXXXXXXXX";
    tls:
      trustedCertificates: []
  mirrors:
  - sourceCluster: "my-kafka-cluster"
    targetCluster: "eventhub"
    sourceConnector:
      config:
        replication.factor: 1
        offset-syncs.topic.replication.factor: 1
        sync.topic.acls.enabled: "false"
    heartbeatConnector:
      config:
        heartbeats.topic.replication.factor: 1
    checkpointConnector:
      config:
        checkpoints.topic.replication.factor: 1
    topicsPattern: ".*"
    groupsPattern: ".*"
EOF

First issue were these warnings. I know it's only INFO, but the deployment keep crashing and will not stay running. I also know with Kafka these type of logs are vague at best.

2022-03-15 21:21:06,151 INFO [AdminClient clientId=adminclient-1] Node -1 disconnected. (org.apache.kafka.clients.NetworkClient) [kafka-admin-client-thread | adminclient-1]

2022-03-15 21:21:06,152 INFO [AdminClient clientId=adminclient-1] Cancelled in-flight METADATA request with correlation id 59 due to node -1 being disconnected (elapsed time since creation: 87ms, elapsed time since send: 87ms, request timeout: 16401ms) (org.apache.kafka.clients.NetworkClient) [kafka-admin-client-thread | adminclient-1]

Then I was able to catch this error in the logs. I can't tell where the problem lies and can't figure out how to resolve it.

2022-03-15 21:22:55,572 INFO [AdminClient clientId=adminclient-1] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-1]

>> org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata

2022-03-15 21:22:55,574 INFO App info kafka.admin.client for adminclient-1 unregistered (org.apache.kafka.common.utils.AppInfoParser) [kafka-admin-client-thread | adminclient-1]
2022-03-15 21:22:55,575 INFO [AdminClient clientId=adminclient-1] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-1]
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
2022-03-15 21:22:55,575 INFO [AdminClient clientId=adminclient-1] Timed out 1 remaining operation(s) during close. (org.apache.kafka.clients.admin.KafkaAdminClient) [kafka-admin-client-thread | adminclient-1]
2022-03-15 21:22:55,582 INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics) [kafka-admin-client-thread | adminclient-1]
2022-03-15 21:22:55,582 INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics) [kafka-admin-client-thread | adminclient-1]
2022-03-15 21:22:55,582 INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics) [kafka-admin-client-thread | adminclient-1]
2022-03-15 21:22:55,583 ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectDistributed) [main]

>> org.apache.kafka.connect.errors.ConnectException: Failed to connect to and describe Kafka cluster. Check workers broker connection and security properties.

at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:70)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:51)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:97)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:80)

>> Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listNodes

When I run a describe I get this error

Status:
  Conditions:
    Last Transition Time:  2022-03-15T21:25:49.149589Z
    Message:               Exceeded timeout of 300000ms while waiting for Deployment resource mirror-maker-eventhub-mirrormaker2 in namespace kafka-cloud to be ready
    Reason:                TimeoutException
    Status:                True
    Type:                  NotReady
  Label Selector:          strimzi.io/cluster=mirror-maker-eventhub,strimzi.io/name=mirror-maker-eventhub-mirrormaker2,strimzi.io/kind=KafkaMirrorMaker2
  Observed Generation:     1
  Replicas:                1
  URL:                     http://mirror-maker-eventhub-mirrormaker2-api.kafka-cloud.svc:8083
Events:                    <none>

When I add the authentication block in mm2:

    authentication: 
      type: plain 
      username: $ConnectionString
      passwordSecret: 
        secretName: eventhubssecret 
        password: eventhubspassword

I get this error, even though the secret exists and is validated.

Status:
  Conditions:
    Last Transition Time:  2022-03-15T23:30:31.200105Z
    Message:               PLAIN authentication selected, but username or password configuration is missing.
    Reason:                InvalidResourceException
    Status:                True
    Type:                  NotReady
  Observed Generation:     2
  Replicas:                0
Events:                    <none>

If anyone has any idea, it'd be greatly appreciated. Spent the entire day to no avail. I removed using secretes as it caused an error that the name and password were not supplied. It's weird.

Thank you.


Solution

  • Answer found.

    The issue was using cat <<EOF | kubectl apply -n kafka-cloud -f - ... EOF

    I think it's because of $ in the username. EH needs this as the actual username for the connection. Once I made the above into a file between cat <<EOF and the last EOF it ran from the CLI without changing anything.

    It worked.

    kubectl apply -n kafka-cloud -f fileName.yaml
    

    When working with EH use a file. Don't cat <<EOF it in.