Search code examples
kuberneteshorizontalpodautoscaler

How to scale my app on nginx metrics without prometheus?


I want to scale my application based on custom metrics (RPS or active connections in this cases). Without having to set up prometheus or use any external service. I can expose this API from my web app. What are my options?


Solution

  • Monitoring different types of metrics (e.g. custom metrics) on most Kubernetes clusters is the foundation that leads to more stable and reliable systems/applications/workloads. As discussed in the comments section, to monitor custom metrics, it is recommended to use tools designed for this purpose rather than inventing a workaround. I'm glad that in this case the final decision was to use Prometheus and KEDA to properly scale the web application.

    I would like to briefly show other community members who are struggling with similar considerations how KEDA works.


    To use Prometheus as a scaler for Keda, we need to install and configure Prometheus. There are many different ways to install Prometheus and you should choose the one that suits your needs.

    I've installed the kube-prometheus stack with Helm:
    NOTE: I allowed Prometheus to discover all PodMonitors/ServiceMonitors within its namespace, without applying label filtering by setting the prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues and prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues values to false.

    $ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    $ helm repo update
    $ helm install prom-1 prometheus-community/kube-prometheus-stack --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
    
    $ kubectl get pods
    NAME                                                     READY   STATUS    RESTARTS   AGE
    alertmanager-prom-1-kube-prometheus-sta-alertmanager-0   2/2     Running   0          2m29s
    prom-1-grafana-865d4c8876-8zdhm                          3/3     Running   0          2m34s
    prom-1-kube-prometheus-sta-operator-6b5d5d8df5-scdjb     1/1     Running   0          2m34s
    prom-1-kube-state-metrics-74b4bb7857-grbw9               1/1     Running   0          2m34s
    prom-1-prometheus-node-exporter-2v2s6                    1/1     Running   0          2m34s
    prom-1-prometheus-node-exporter-4vc9k                    1/1     Running   0          2m34s
    prom-1-prometheus-node-exporter-7jchl                    1/1     Running   0          2m35s
    prometheus-prom-1-kube-prometheus-sta-prometheus-0       2/2     Running   0          2m28s
    

    Then we can deploy an application that will be monitored by Prometheus. I've created a simple application that exposes some metrics (such as nginx_vts_server_requests_total) on the /status/format/prometheus path:

    $ cat app-1.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: app-1
    spec:
      selector:
        matchLabels:
          app: app-1
      template:
        metadata:
          labels:
            app: app-1
        spec:
          containers:
          - name: app-1
            image: mattjcontainerregistry/nginx-vts:v1.0
            resources:
              limits:
                cpu: 50m
              requests:
                cpu: 50m
            ports:
            - containerPort: 80
              name: http
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: app-1
      labels:
        app: app-1
    spec:
      ports:
      - port: 80
        targetPort: 80
        name: http
      selector:
        app: app-1
      type: LoadBalancer
    

    Next, create a ServiceMonitor that describes how to monitor our app-1 application:

    $ cat servicemonitor.yaml
    kind: ServiceMonitor
    apiVersion: monitoring.coreos.com/v1
    metadata:
      name: app-1
      labels:
        app: app-1
    spec:
      selector:
        matchLabels:
          app: app-1
      endpoints:
      - interval: 15s
        path: "/status/format/prometheus"
        port: http
    

    After waiting some time, let's check the app-1 logs to make sure that it is scrapped correctly:

    $ kubectl get pods | grep app-1
    app-1-5986d56f7f-2plj5                                   1/1     Running   0          35s
    
    $ kubectl logs -f app-1-5986d56f7f-2plj5
    10.44.1.6 - - [07/Feb/2022:16:31:11 +0000] "GET /status/format/prometheus HTTP/1.1" 200 2742 "-" "Prometheus/2.33.1" "-"
    10.44.1.6 - - [07/Feb/2022:16:31:26 +0000] "GET /status/format/prometheus HTTP/1.1" 200 3762 "-" "Prometheus/2.33.1" "-"
    10.44.1.6 - - [07/Feb/2022:16:31:41 +0000] "GET /status/format/prometheus HTTP/1.1" 200 3762 "-" "Prometheus/2.33.1" "-"
    

    Now it's time to deploy KEDA. There are a few approaches to deploy KEDA runtime as described in the KEDA documentation. I chose to install KEDA with Helm because it's very simple :-)

    $ helm repo add kedacore https://kedacore.github.io/charts
    $ helm repo update
    $ kubectl create namespace keda
    $ helm install keda kedacore/keda --namespace keda
    

    The last thing we need to create is a ScaledObject which is used to define how KEDA should scale our application and what the triggers are. In the example below, I used the nginx_vts_server_requests_total metric.
    NOTE: For more information on the prometheus trigger, see the Trigger Specification documentation.

    $ cat scaled-object.yaml
    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
      name: scaled-app-1
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: app-1
      pollingInterval: 30                               
      cooldownPeriod:  120                              
      minReplicaCount: 1                                
      maxReplicaCount: 5                               
      advanced:                                         
        restoreToOriginalReplicaCount: false            
        horizontalPodAutoscalerConfig:                  
          behavior:                                     
            scaleDown:
              stabilizationWindowSeconds: 300
              policies:
              - type: Percent
                value: 100
                periodSeconds: 15
      triggers:
      - type: prometheus
        metadata:
          serverAddress: http://prom-1-kube-prometheus-sta-prometheus.default.svc:9090
          metricName: nginx_vts_server_requests_total
          query: sum(rate(nginx_vts_server_requests_total{code="2xx", service="app-1"}[2m])) # Note: query must return a vector/scalar single element response
          threshold: '10'
      
    $ kubectl apply -f scaled-object.yaml
    scaledobject.keda.sh/scaled-app-1 created
    

    Finally, we can check if the app-1 application scales correctly based on the number of requests:

    $ for a in $(seq 1 10000); do curl <PUBLIC_IP_APP_1> 1>/dev/null 2>&1; done
    
    $ kubectl get hpa -w
    NAME                    REFERENCE          TARGETS          MINPODS   MAXPODS   REPLICAS   
    keda-hpa-scaled-app-1   Deployment/app-1   0/10 (avg)        1         5         1           
    keda-hpa-scaled-app-1   Deployment/app-1   15/10 (avg)       1         5         2         
    keda-hpa-scaled-app-1   Deployment/app-1   12334m/10 (avg)   1         5         3       
    keda-hpa-scaled-app-1   Deployment/app-1   13250m/10 (avg)   1         5         4      
    keda-hpa-scaled-app-1   Deployment/app-1   12600m/10 (avg)   1         5         5          
    
    $ kubectl get pods | grep app-1
    app-1-5986d56f7f-2plj5                                   1/1     Running   0          36m
    app-1-5986d56f7f-5nrqd                                   1/1     Running   0          77s
    app-1-5986d56f7f-78jw8                                   1/1     Running   0          94s
    app-1-5986d56f7f-bl859                                   1/1     Running   0          62s
    app-1-5986d56f7f-xlfp6                                   1/1     Running   0          45s
    

    As you can see above, our application has been correctly scaled to 5 replicas.