Search code examples
redishandleristiorate-limitingamazon-eks

Debugging istio rate limiting handler


I'm trying to apply rate limiting on some of our internal services (inside the mesh).

I used the example from the docs and generated redis rate limiting configurations that include a (redis) handler, quota instance, quota spec, quota spec binding and rule to apply the handler.

This redis handler:

apiVersion: config.istio.io/v1alpha2
kind: handler
metadata:
  name: redishandler
  namespace: istio-system
spec:
  compiledAdapter: redisquota
  params:
    redisServerUrl: <REDIS>:6379
    connectionPoolSize: 10
    quotas:
    - name: requestcountquota.instance.istio-system
      maxAmount: 10
      validDuration: 100s
      rateLimitAlgorithm: FIXED_WINDOW
      overrides:
      - dimensions:
          destination: s1
        maxAmount: 1
      - dimensions:
          destination: s3
        maxAmount: 1
      - dimensions:
          destination: s2
        maxAmount: 1

The quota instance (I'm only interested in limiting by destination at the moment):

apiVersion: config.istio.io/v1alpha2
kind: instance
metadata:
  name: requestcountquota
  namespace: istio-system
spec:
  compiledTemplate: quota
  params:
    dimensions:
      destination: destination.labels["app"] | destination.service.host | "unknown"

A quota spec, charging 1 per request if I understand correctly:

apiVersion: config.istio.io/v1alpha2
kind: QuotaSpec
metadata:
  name: request-count
  namespace: istio-system
spec:
  rules:
  - quotas:
    - charge: 1
      quota: requestcountquota

A quota binding spec that all participating services pre-fetch. I also tried with service: "*" which also did nothing.

apiVersion: config.istio.io/v1alpha2
kind: QuotaSpecBinding
metadata:
  name: request-count
  namespace: istio-system
spec:
  quotaSpecs:
  - name: request-count
    namespace: istio-system
  services:
  - name: s2
    namespace: default
  - name: s3
    namespace: default
  - name: s1
    namespace: default
    # - service: '*'  # Uncomment this to bind *all* services to request-count

A rule to apply the handler. Currently on all occasions (tried with matches but didn't change anything as well):

apiVersion: config.istio.io/v1alpha2
kind: rule
metadata:
  name: quota
  namespace: istio-system
spec:
  actions:
  - handler: redishandler
    instances:
    - requestcountquota

The VirtualService definitions are pretty similar for all participants:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: s1
spec:
  hosts:
  - s1

  http:
  - route:
    - destination:
        host: s1

The problem is nothing really happens and no rate limiting takes place. I tested with curl from pods inside the mesh. The redis instance is empty (no keys on db 0, which I assume is what the rate limiting would use) so I know it can't practically rate-limit anything.

The handler seems to be configured properly (how can I make sure?) because I had some errors in it which were reported in mixer (policy). There are still some errors but none which I associate to this problem or the configuration. The only line in which redis handler is mentioned is this:

2019-12-17T13:44:22.958041Z info    adapters    adapter closed all scheduled daemons and workers    {"adapter": "redishandler.istio-system"}   

But its unclear if its a problem or not. I assume its not.

These are the rest of the lines from the reload once I deploy:

2019-12-17T13:44:22.601644Z info    Built new config.Snapshot: id='43'
2019-12-17T13:44:22.601866Z info    adapters    getting kubeconfig from: "" {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.601881Z warn    Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2019-12-17T13:44:22.602718Z info    adapters    Waiting for kubernetes cache sync...    {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.903844Z info    adapters    Cache sync successful.  {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.903878Z info    adapters    getting kubeconfig from: "" {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.903882Z warn    Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2019-12-17T13:44:22.904808Z info    Setting up event handlers
2019-12-17T13:44:22.904939Z info    Starting Secrets controller
2019-12-17T13:44:22.904991Z info    Waiting for informer caches to sync
2019-12-17T13:44:22.957893Z info    Cleaning up handler table, with config ID:42
2019-12-17T13:44:22.957924Z info    adapters    deleted remote controller   {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.957999Z info    adapters    adapter closed all scheduled daemons and workers    {"adapter": "prometheus.istio-system"}
2019-12-17T13:44:22.958041Z info    adapters    adapter closed all scheduled daemons and workers    {"adapter": "redishandler.istio-system"}   
2019-12-17T13:44:22.958065Z info    adapters    shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.958050Z info    adapters    shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.958096Z info    adapters    shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.958182Z info    adapters    shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:23.958109Z info    adapters    adapter closed all scheduled daemons and workers    {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:55:21.042131Z info    transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-12-17T14:14:00.265722Z info    transport: loopyWriter.run returning. connection error: desc = "transport is closing"

I'm using the demo profile with disablePolicyChecks: false to enable rate limiting. This is on istio 1.4.0, deployed on EKS.

I also tried memquota (this is our staging environment) with low limits and nothing seems to work. I never got a 429 no matter how much I went over the rate limit configured.

I don't know how to debug this and see where the configuration is wrong causing it to do nothing.

Any help is appreciated.


Solution

  • I too spent hours trying to decipher the documentation and get a sample working.

    According to the documentation, they recommended that we enable policy checks:

    https://istio.io/docs/tasks/policy-enforcement/rate-limiting/

    However when that did not work, I did an "istioctl profile dump", searched for policy, and tried several settings.

    I used Helm install and passed the following and then was able to get the described behaviour:

    --set global.disablePolicyChecks=false \ --set values.pilot.policy.enabled=true \ ===> this made it work, but it's not in the docs.