Search code examples
google-compute-enginekubernetesgcloudfluentdstackdriver

How to setup error reporting in Stackdriver from kubernetes pods?


I'm a bit confused at how to setup error reporting in kubernetes, so errors are visible in Google Cloud Console / Stackdriver "Error Reporting"?

According to documentation https://cloud.google.com/error-reporting/docs/setting-up-on-compute-engine we need to enable fluentd' "forward input plugin" and then send exception data from our apps. I think this approach would have worked if we had setup fluentd ourselves, but it's already pre-installed on every node in a pod that just runs gcr.io/google_containers/fluentd-gcp docker image.

How do we enable forward input on those pods and make sure that http port available to every pod on the nodes? We also need to make sure this config is used by default when we add more nodes to our cluster.

Any help would be appreciated, may be I'm looking at all this from a wrong point?


Solution

  • The basic idea is to start a separate pod that receives structured logs over TCP and forwards it to Cloud Logging, similar to a locally-running fluentd agent. See below for the steps I used.

    (Unfortunately, the logging support that is built into Docker and Kubernetes cannot be used - it just forwards individual lines of text from stdout/stderr as separate log entries which prevents Error Reporting from seeing complete stack traces.)

    Create a docker image for a fluentd forwarder using a Dockerfile as follows:

    FROM gcr.io/google_containers/fluentd-gcp:1.18
    
    COPY fluentd-forwarder.conf /etc/google-fluentd/google-fluentd.conf
    

    Where fluentd-forwarder.conf contains the following:

    <source>
      type forward
      port 24224
    </source>
    
    <match **>
      type google_cloud
      buffer_chunk_limit 2M
      buffer_queue_limit 24
      flush_interval 5s
      max_retry_wait 30
      disable_retry_limit
    </match>
    

    Then build and push the image:

    $ docker build -t gcr.io/###your project id###/fluentd-forwarder:v1 .
    $ gcloud docker push gcr.io/###your project id###/fluentd-forwarder:v1
    

    You need a replication controller (fluentd-forwarder-controller.yaml):

    apiVersion: v1
    kind: ReplicationController
    metadata:
      name: fluentd-forwarder
    spec:
      replicas: 1
      template:
        metadata:
          name: fluentd-forwarder
          labels:
            app: fluentd-forwarder
        spec:
          containers:
          - name: fluentd-forwarder
            image: gcr.io/###your project id###/fluentd-forwarder:v1
            env:
            - name: FLUENTD_ARGS
              value: -qq
            ports:
            - containerPort: 24224
    

    You also need a service (fluentd-forwarder-service.yaml):

    apiVersion: v1
    kind: Service
    metadata:
      name: fluentd-forwarder
    spec:
      selector:
        app: fluentd-forwarder
      ports:
      - protocol: TCP
        port: 24224
    

    Then create the replication controller and service:

    $ kubectl create -f fluentd-forwarder-controller.yaml
    $ kubectl create -f fluentd-forwarder-service.yaml
    

    Finally, in your application, instead of using 'localhost' and 24224 to connect to the fluentd agent as described on https://cloud.google.com/error-reporting/docs/setting-up-on-compute-engine, use the values of evironment variables FLUENTD_FORWARDER_SERVICE_HOST and FLUENTD_FORWARDER_SERVICE_PORT.