Skip to main content

Building an end-to-end monitoring solution with Azure Arc, Log Analytics and Workbooks - Part 5: Putting it all together

Wow! We covered a lot in this series.

Part 1 - Overview & Architecture

Part 2 – Data collection with Azure Arc

Part 3 – Data persistence in Log Analytics

Part 4 -  Data visualization with Azure Workbooks

Time for a wrap up and some troubleshooting

Let's trace the data flow from start to finish to make sure everything connects:

  1. The Azure Monitor Agent runs on each Arc-enabled on-prem VM.
  2. The Data Collection Rule tells the agent what health data to gather — application pools, Windows services, and scheduled tasks.
  3. The agent collects that data on a regular interval and ships it to Azure.
  4. The DCR routes the incoming data to our custom table (OnPremHealthStatus_CL) in the Log Analytics Workspace.
  5. The Workbook queries that table and renders the dashboard.

If any link in that chain breaks, data stops flowing. The troubleshooting section below covers the most common failure points.

Troubleshooting checklist

No data appearing in the workbook: Start at the table. Run a basic OnPremHealthStatus_CL | take 10 query directly in Log Analytics. If there are no results, the issue is upstream of the Workbook — either the agent isn't sending data or the DCR isn't routing it correctly.

Query timeout: If queries are timing out, reduce the time range or add more specific filters. Log Analytics has query timeout limits (typically 3 minutes for portal queries).

Conditional formatting not applying: Make sure the status values in your data exactly match the conditions you've set (case-sensitive). A status of running (lowercase) won't match a condition checking for Running.

Data is stale: Check the TimeGenerated column. If the most recent rows are hours old, the agent may have stopped collecting or lost connectivity to Azure. Check the agent health on the VM.

Partial data (some VMs missing): Verify the DCR association. Not all VMs may have the rule associated. Check the DCR's resource associations in the portal.

Agent version too old: The custom text logs data source type requires Azure Monitor Agent version 1.10 or higher. If your Arc-enabled VMs have an older agent version, update it through Azure Policy or manually via extension management in the portal.

File permissions: The Azure Monitor Agent runs under the NT AUTHORITY\SYSTEM account on Windows. Make sure the log file your script writes to is readable by SYSTEM. If you're writing to C:\ProgramData\ or C:\MonitoringData\, this should be fine by default.

DCR not updating: If you edit a DCR after it's been associated with VMs, the agents don't pick up the changes instantly. It can take up to 5 minutes for the agent to refresh its configuration. If you need an immediate update, restart the Azure Monitor Agent service on the VM.

Firewall blocking outbound connections: The Azure Monitor Agent needs outbound HTTPS access to several Azure endpoints. If your on-prem network has restrictive egress rules, make sure the following domains are allowed:

  • *.ods.opinsights.azure.com (data ingestion)
  • *.oms.opinsights.azure.com (agent configuration)
  • *.monitoring.azure.com (Arc and agent management)

If you're using a proxy, the agent can be configured to route through it via the Arc agent's proxy settings.

Schema mismatches: If you see errors when the agent tries to write to the custom table, the data being sent doesn't match the table schema. Double-check that the column names and types in your DCR data source configuration match the table definition exactly.

How to further improve this setup?

This setup gives you a solid foundation. From here, there are a few natural directions to extend it:

Alerting: Layer Azure Monitor alerts on top of the same custom table. You can alert when a critical service enters a Stopped or Failed state, without needing a separate alerting tool.

Historical trending: Extend the Workbook with time-series queries to track how component health has changed over days or weeks. This is useful for catching intermittent failures that a point-in-time view would miss.

Automation: Once you have reliable health data flowing into Log Analytics, you can trigger Logic Apps or Azure Functions in response to specific health events — auto-restarting a failed service, for example.

Maybe I’ll cover these extensions in future posts. For now, the five parts above should get you from zero to a working health dashboard.

Feel free to adapt the schema, queries, and workbook structure to fit your environment.

More information

Azure Monitor Agent Network Configuration - Azure Monitor | Microsoft Learn

Popular posts from this blog

Kubernetes–Limit your environmental impact

Reducing the carbon footprint and CO2 emission of our (cloud) workloads, is a responsibility of all of us. If you are running a Kubernetes cluster, have a look at Kube-Green . kube-green is a simple Kubernetes operator that automatically shuts down (some of) your pods when you don't need them. A single pod produces about 11 Kg CO2eq per year( here the calculation). Reason enough to give it a try! Installing kube-green in your cluster The easiest way to install the operator in your cluster is through kubectl. We first need to install a cert-manager: kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml Remark: Wait a minute before you continue as it can take some time before the cert-manager is up & running inside your cluster. Now we can install the kube-green operator: kubectl apply -f https://github.com/kube-green/kube-green/releases/latest/download/kube-green.yaml Now in the namespace where we want t...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

Podman– Command execution failed with exit code 125

After updating WSL on one of the developer machines, Podman failed to work. When we took a look through Podman Desktop, we noticed that Podman had stopped running and returned the following error message: Error: Command execution failed with exit code 125 Here are the steps we tried to fix the issue: We started by running podman info to get some extra details on what could be wrong: >podman info OS: windows/amd64 provider: wsl version: 5.3.1 Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:2655: connectex: No connection could be made because the target machine actively refused it. That makes sense as the podman VM was not running. Let’s check the VM: >podman machine list NAME         ...