Search code examples
azureloggingmonitor

How to get hollistic view of Azure environment


There's an awful lot of disjointed documentation on monitoring network/resources in Azure. What I'm looking for is which pieces are needed to get information from VMs, NVA firewalls, azure load balancers, and other network resources and network connectivity into a single pain of glass in Azure. Only concerned about Azure, not on-prem for now.

I've come across azure monitor, log analytics work spaces, event hub, vm extensions, network watcher, insights, etc...but I'm not sure which are required and which are not. One doc leads to the next and I end up with 30 tabs open. I'll also need to be able to push logs to other security devices such as a SIEM.

Does anyone know of a deployment guide that wraps this all up in a more logical fashion? Does anyone have any feedback on which pieces from azure (not 3rd parties) are required at a minimum to accomplish a single pane of glass to view my Azure environment holistically?


Solution

  • General overview of observability in Azure

    Likely, the thing you're looking for is Azure Monitor. It's an umbrella term for everything observability related inside Azure.

    1. To store Metrics and Logs you need Log Analytics: it can query data with kusto query language, visualize results, define Alerts on queries.

    2. Alerts is quite a complex beast, as it is spread across the entire cloud. Two types that I use the most:

      1. log-analytics alert (which I mentioned above)
      2. Alerts tab, which is available at every Azure component view. for example, open resource group, and scroll down to Monitoring section

      Each component also has a subset of built-in metrics. Likely, you noticed that many azure components on the Overview view display some charts. For example, Azure Storage Account displays Total egress, Total ingress, and other line-charts. When you click on these charts you can customize them. These metrics and charts are free to use.

    3. Microsoft also has all-in-one observability solution for Azure Functions and Web Apps: Application Insights

    4. Dashboards allows to join multiple charts into a single view and share it with others.

    5. If you care about security, Azure proposes Azure Security Center

    Deployment/management strategy

    I suggest to start with:

    1. Create Log Analytics Workspace, which is the storage for metrics and logs. The azure docs article explains how to design it: how many instances to use, how to rate-limit ingestion (it might be expensive if goes out of control), how to access it and so on.

    2. To get Azure components logs, look for Diagnostic Settings tab at a component page at Azure portal, but not all components has it (sic!). I suggest

      • sending the most critical data to Log Analytics workspace to store them in a queryable format for 30 days (it's in free tier). This is needed for investigating current issues with your infrastructure
      • if you might need logs later than 30 days - send them to Storage Account
      • you mentioned SIEM integration - route required events to Event Hub and then process the stream according to your requirements

      So, if you need long-term storage - you need to create Azure Storage Account.

      If you need real-time analysis - you need to build a pipeline based on Azure Event Hub.

    3. If you have Azure Functions and Web Apps - add Application Insights. According to my experience, I would suggest starting with a separate instance per each Azure Function resource or Service.

    4. Create Alerts for each component separately. If you do it through UI - open component page at the portal and look for Alerts tab there. If you're automating the process (please do so as soon as possible), do not expect easy trip: I used ARM templates and terraform - in both cases, there are dozens of barely documented features.

    5. Join related components core-metrics into Dashboards and share it with the team. This guide is a good starting point. Note, when you share the dashboard, it's also persisted as an azure resource in the subscription.