Search code examples
apache-kafkaknativeknative-eventing

Knative Eventing Dead Letter Sink Not Triggered


I've got sequences working in my Knative environment. We're trying to configure and confirm the DLQ/Dead Letter Sink works so we can write tests and things against sequences. I can't for the life of my get Knative to send anything to the Dead Letter Sink. I've approached this two ways, the first was setting up a Broker, Trigger, Services and a Sequence. I've defined in the Broker a service to use for the DLQ. I then setup a service in the sequence to intentionally returned a non-200 status. When I view the logs for the channel dispatcher in the knative-eventing namespace, I believe what I read is that it thinks there was a failure.

I read some things about the default MT Broker maybe not handling DLQ correctly so then I installed Kafka. Got that all working and essentially, it appears to do the same thing.

I started to wonder, ok, maybe within a sequence you can't do DLQ. After all the documentation only talks about DLQ with subscriptions and brokers, and maybe Knative believes that the message was successfully delivered from the broker to the sequence, even if it dies within the sequence. So I manually setup and channel and a subscription and sent the data straight to the channel and again, what I got was essentially the same thing, which is:

The sequence will stop on whatever step doesn't return a 2XX status code, but nothing gets sent to the DLQ. I even made the subscription go straight to the service (instead of a sequence) and that service returned a 500 and still nothing to the DLQ.

The log item below is from the channel dispatcher pod running in the knative-eventing namespace. It basically looks the same with In memory channel or Kafka, i.e. expected 2xx got 500.

{"level":"info","ts":"2021-11-30T16:01:05.313Z","logger":"kafkachannel-dispatcher","caller":"consumer/consumer_handler.go:162","msg":"Failure while handling a message","knative.dev/pod":"kafka-ch-dispatcher-5bb8f84976-rpd87","knative.dev/controller":"knative.dev.eventing-kafka.pkg.channel.consolidated.reconciler.dispatcher.Reconciler","knative.dev/kind":"messaging.knative.dev.KafkaChannel","knative.dev/traceid":"957c394a-1636-44ad-b024-fb0dde9c8440","knative.dev/key":"kafka/test-sequence-kn-sequence-0","topic":"knative-messaging-kafka.kafka.test-sequence-kn-sequence-0","partition":0,"offset":4,"error":"unable to complete request to http://cetf.kafka.svc.cluster.local: unexpected HTTP response, expected 2xx, got 500"}
{"level":"warn","ts":"2021-11-30T16:01:05.313Z","logger":"kafkachannel-dispatcher","caller":"dispatcher/dispatcher.go:314","msg":"Error in consumer group","knative.dev/pod":"kafka-ch-dispatcher-5bb8f84976-rpd87","error":"unable to complete request to http://cetf.kafka.svc.cluster.local: unexpected HTTP response, expected 2xx, got 500"}

Notes on setup. I deployed literally everything to the same namespace for testing. I followed the guide here essentially to setup my broker when doing the broker/trigger and for deploying Kafka. My broker looked like this:

apiVersion: eventing.knative.dev/v1
kind: Broker
metadata:
  annotations:
    # case-sensitive
    eventing.knative.dev/broker.class: Kafka
  name: default
  namespace: kafka
spec:
  # Configuration specific to this broker.
  config:
    apiVersion: v1
    kind: ConfigMap
    name: kafka-broker-config
    namespace: knative-eventing
  # Optional dead letter sink, you can specify either:
  #  - deadLetterSink.ref, which is a reference to a Callable
  #  - deadLetterSink.uri, which is an absolute URI to a Callable (It can potentially be 
out of the Kubernetes cluster)
  delivery:
    deadLetterSink:
      ref:
        apiVersion: serving.knative.dev/v1
        kind: Service
        name: dlq
        namespace: kafka

When I manually created the subscription and channel my subscription looked like this:

apiVersion: messaging.knative.dev/v1
kind: Subscription
metadata:
  name: test-sub # Name of the Subscription.
  namespace: kafka
spec:
  channel:
    apiVersion: messaging.knative.dev/v1beta1
    kind: KafkaChannel
    name: test-channel
  delivery:
    deadLetterSink:
      backoffDelay: PT1S
      backoffPolicy: linear
      retry: 1
      ref:
        apiVersion: serving.knative.dev/v1
        kind: Service
        name: dlq
        namespace: kafka
  subscriber:
    ref:
        apiVersion: serving.knative.dev/v1
        kind: Service
        name: cetf

No matter what I do I NEVER see the dlq pod spin up. I've adjusted retry stuff, waited and waited, used the default channel/broker, Kafka, etc. I simply cannot see the pod ever run. Is there something I'm missing, what on earth could be wrong? I can set the subscriber to be a junk URI and then the DLQ pod spins up, but shouldn't it also spin up if the service it sends events to returns error codes?

Can anyone provide a couple of very basic YAML files to deploy the simplest version of a working DLQ to test with?


Solution

  • I never found an example of this in the docs, but the API docs for the SequenceStep does show a delivery property. Which, when assigned, uses the DLQ.

    steps:
      - ref:
          apiVersion: serving.knative.dev/v1
          kind: Service
          name: service-step
        delivery:
          # DLS to an event-display service
          deadLetterSink:
            ref:
              apiVersion: serving.knative.dev/v1
              kind: Service
              name: dlq-service
              namespace: ns-name
    

    It seems odd to have to specify a delivery for EVERY step and not just the sequence as a whole.