Search code examples
azurekubernetesazure-aksazure-monitor

How to monitor uptime of a Kubernetes service which is not public in Azure AKS


I have a deployment with multiple pods in Azure Kubernetes Service.
There is a K8s service that is used to connect deployment pods.
The service has a private IP accessible in Azure Virtual Network. The service type is LoadBalancer.
I want to monitor and see if the service is up. If it is not up, trigger an email alert.

I have identified two options:

Option 1:
I enabled AKS diagnostics so that I get the service logs. When I check the logs with the query below, I can see service failure logs. I think I can use these logs in Azure Monitor to trigger an alert. I still need to verify if it will work in every type of failure.

KubeEvents
| where TimeGenerated > ago(7d)
| where not(isempty(Namespace))
| where ObjectKind == 'Service'

Option 2:
Create an Azure Function with HTTPS API enabled so I can call it externally from Pingdom. Make sure the function uses AppService with a VM so that it can access private IPs and the service (As this is using VM, it is increasing the cost). The function checks the private IP and sees if it is returning 200, and it will return 200; otherwise, it will return an error code. So Pingdom will keep the uptime details and also alert accordingly when it is down.

Summary:
I am not 100% sure about option one. For the second option, it seems like doing too much work, and I think that there should be a better and more robust way of doing it.

So I am interested in getting feedback from some Azure and K8s experts who dealt with the problem and solved it in a more robust way.


Solution

  • Using Azure Application Insights there are two [private monitoring options] (https://learn.microsoft.com/en-us/azure/azure-monitor/app/availability-private-test) described.

    1. Allowing limited inbound connectivity
    2. Using Azure Functions, as you have described in your Option 2.

    Personally I prefer endpoint monitoring to be more independent from the resource that's hosting the service.