How to scale my app on nginx metrics without prometheus?

I want to scale my application based on custom metrics (RPS or active connections in this cases). Without having to set up prometheus or use any external service. I can expose this API from my web app. What are my options?

Solution

Monitoring different types of metrics (e.g. custom metrics) on most Kubernetes clusters is the foundation that leads to more stable and reliable systems/applications/workloads. As discussed in the comments section, to monitor custom metrics, it is recommended to use tools designed for this purpose rather than inventing a workaround. I'm glad that in this case the final decision was to use Prometheus and KEDA to properly scale the web application.

I would like to briefly show other community members who are struggling with similar considerations how KEDA works.

To use Prometheus as a scaler for Keda, we need to install and configure Prometheus. There are many different ways to install Prometheus and you should choose the one that suits your needs.

I've installed the kube-prometheus stack with Helm:
NOTE: I allowed Prometheus to discover all PodMonitors/ServiceMonitors within its namespace, without applying label filtering by setting the prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues and prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues values to false.

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm install prom-1 prometheus-community/kube-prometheus-stack --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

$ kubectl get pods
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prom-1-kube-prometheus-sta-alertmanager-0   2/2     Running   0          2m29s
prom-1-grafana-865d4c8876-8zdhm                          3/3     Running   0          2m34s
prom-1-kube-prometheus-sta-operator-6b5d5d8df5-scdjb     1/1     Running   0          2m34s
prom-1-kube-state-metrics-74b4bb7857-grbw9               1/1     Running   0          2m34s
prom-1-prometheus-node-exporter-2v2s6                    1/1     Running   0          2m34s
prom-1-prometheus-node-exporter-4vc9k                    1/1     Running   0          2m34s
prom-1-prometheus-node-exporter-7jchl                    1/1     Running   0          2m35s
prometheus-prom-1-kube-prometheus-sta-prometheus-0       2/2     Running   0          2m28s

Then we can deploy an application that will be monitored by Prometheus. I've created a simple application that exposes some metrics (such as nginx_vts_server_requests_total) on the /status/format/prometheus path:

$ cat app-1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-1
spec:
  selector:
    matchLabels:
      app: app-1
  template:
    metadata:
      labels:
        app: app-1
    spec:
      containers:
      - name: app-1
        image: mattjcontainerregistry/nginx-vts:v1.0
        resources:
          limits:
            cpu: 50m
          requests:
            cpu: 50m
        ports:
        - containerPort: 80
          name: http
---
apiVersion: v1
kind: Service
metadata:
  name: app-1
  labels:
    app: app-1
spec:
  ports:
  - port: 80
    targetPort: 80
    name: http
  selector:
    app: app-1
  type: LoadBalancer

Next, create a ServiceMonitor that describes how to monitor our app-1 application:

$ cat servicemonitor.yaml
kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
metadata:
  name: app-1
  labels:
    app: app-1
spec:
  selector:
    matchLabels:
      app: app-1
  endpoints:
  - interval: 15s
    path: "/status/format/prometheus"
    port: http

After waiting some time, let's check the app-1 logs to make sure that it is scrapped correctly:

$ kubectl get pods | grep app-1
app-1-5986d56f7f-2plj5                                   1/1     Running   0          35s

$ kubectl logs -f app-1-5986d56f7f-2plj5
10.44.1.6 - - [07/Feb/2022:16:31:11 +0000] "GET /status/format/prometheus HTTP/1.1" 200 2742 "-" "Prometheus/2.33.1" "-"
10.44.1.6 - - [07/Feb/2022:16:31:26 +0000] "GET /status/format/prometheus HTTP/1.1" 200 3762 "-" "Prometheus/2.33.1" "-"
10.44.1.6 - - [07/Feb/2022:16:31:41 +0000] "GET /status/format/prometheus HTTP/1.1" 200 3762 "-" "Prometheus/2.33.1" "-"

Now it's time to deploy KEDA. There are a few approaches to deploy KEDA runtime as described in the KEDA documentation. I chose to install KEDA with Helm because it's very simple :-)

$ helm repo add kedacore https://kedacore.github.io/charts
$ helm repo update
$ kubectl create namespace keda
$ helm install keda kedacore/keda --namespace keda

The last thing we need to create is a ScaledObject which is used to define how KEDA should scale our application and what the triggers are. In the example below, I used the nginx_vts_server_requests_total metric.
NOTE: For more information on the prometheus trigger, see the Trigger Specification documentation.

$ cat scaled-object.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: scaled-app-1
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app-1
  pollingInterval: 30                               
  cooldownPeriod:  120                              
  minReplicaCount: 1                                
  maxReplicaCount: 5                               
  advanced:                                         
    restoreToOriginalReplicaCount: false            
    horizontalPodAutoscalerConfig:                  
      behavior:                                     
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prom-1-kube-prometheus-sta-prometheus.default.svc:9090
      metricName: nginx_vts_server_requests_total
      query: sum(rate(nginx_vts_server_requests_total{code="2xx", service="app-1"}[2m])) # Note: query must return a vector/scalar single element response
      threshold: '10'
  
$ kubectl apply -f scaled-object.yaml
scaledobject.keda.sh/scaled-app-1 created

Finally, we can check if the app-1 application scales correctly based on the number of requests:

$ for a in $(seq 1 10000); do curl <PUBLIC_IP_APP_1> 1>/dev/null 2>&1; done

$ kubectl get hpa -w
NAME                    REFERENCE          TARGETS          MINPODS   MAXPODS   REPLICAS   
keda-hpa-scaled-app-1   Deployment/app-1   0/10 (avg)        1         5         1           
keda-hpa-scaled-app-1   Deployment/app-1   15/10 (avg)       1         5         2         
keda-hpa-scaled-app-1   Deployment/app-1   12334m/10 (avg)   1         5         3       
keda-hpa-scaled-app-1   Deployment/app-1   13250m/10 (avg)   1         5         4      
keda-hpa-scaled-app-1   Deployment/app-1   12600m/10 (avg)   1         5         5          

$ kubectl get pods | grep app-1
app-1-5986d56f7f-2plj5                                   1/1     Running   0          36m
app-1-5986d56f7f-5nrqd                                   1/1     Running   0          77s
app-1-5986d56f7f-78jw8                                   1/1     Running   0          94s
app-1-5986d56f7f-bl859                                   1/1     Running   0          62s
app-1-5986d56f7f-xlfp6                                   1/1     Running   0          45s

As you can see above, our application has been correctly scaled to 5 replicas.