MetricSign
Start free
Data Observability8 min read·

Azure Monitor Alerts: What It Catches, What It Misses, and What to Do Next

Azure Monitor is excellent at one thing: telling you when CPU goes up. The problems that actually wake data teams at night live in the gaps between what it watches and what your business sees.

Lees dit artikel in het Nederlands →

Azure Monitor Alerts: What It Catches, What It Misses, and What to Do Next

What Azure Monitor alerts actually do well

Azure Monitor is one of the most-used monitoring stacks on the planet. It collects three streams: metrics (numerical, time-series), logs (text, queryable via KQL in Log Analytics), and traces (Application Insights). On top of those, you build alert rules that fire when a threshold is breached or a log query returns rows.

For infrastructure and application monitoring, this is exactly what you need:

Use caseAzure Monitor coverageTypical alert
VM CPU spikeMetric alert on Percentage CPU"VM-prod-01 CPU > 90% for 5 min"
App exceptionApplication Insights log alert"More than 10 errors in 1 min"
Pipeline runtimeLog alert on ADFActivityRun"Activity duration > 30 min"
Cost anomalyCost Management alert"Daily spend > $500"
Key Vault unauthorised accessActivity Log alert"Vault accessed by unknown principal"

When the question is "is this Azure resource healthy?", Azure Monitor alerts answer it well, fast and cheap.

Microsoft's own Azure Monitor docs describe alert rules as "the way to proactively notify you when an important condition is found." Note the framing: a condition you defined, on a signal Azure already collects.

The blind spots that hurt data teams

The trouble starts when the question shifts from "is the resource healthy?" to "is the data correct?". Three categories of failure regularly reach business users without Azure Monitor noticing.

1. Silent data quality issues A pipeline runs successfully. The job exits with code 0. The downstream table is populated. Azure Monitor sees a green tick. But the rows that should have arrived are missing because an upstream API silently changed a field name. There is nothing for Azure Monitor to alert on, because by its definition nothing went wrong.

2. Cross-tool lineage breaks ADF feeds Databricks. Databricks writes to a Lakehouse. Power BI reads from the Lakehouse. Azure Monitor watches each of those independently. When the chain breaks at the Lakehouse-to-Power BI handoff (refresh fails, semantic model recompiles, dataset goes stale), no Azure Monitor alert connects the dots back to ADF or upstream Databricks.

3. Power BI refresh anomalies Azure Monitor can be wired up to Power BI activity logs, but it doesn't natively understand: - A refresh that succeeds but takes 3x longer than baseline - A model that hasn't refreshed in 18 hours when it should every 6 - A capacity that throttled refreshes silently because of CU exhaustion

These are the symptoms data teams care about, and they're invisible to default Azure Monitor alerting.

A 2024 Wakefield Research / Monte Carlo survey found data teams report 70 incidents per 1.000 tables per year on average. (Monte Carlo, 2024) Most of those incidents don't surface in infrastructure monitoring at all.

Why this happens (it's not Azure's fault)

Azure Monitor was designed around resources. A VM, a function app, a SQL database — each is a discrete object with predictable signals. The platform excels at watching those signals and alerting on threshold breaches.

Data reliability is a different shape. The thing you care about is not a single resource. It's a flow of data through many resources, where a failure in one of them produces an effect three layers downstream. The right unit of monitoring is the dataset, not the VM. Azure Monitor doesn't model datasets natively.

You can fight against this by writing complex KQL queries that join ADFActivityRun, DatabricksJobRun, PowerBIDatasetActivity, and AzureDiagnostics tables to approximate a dataset's lifecycle. Many teams do. The result is brittle, expensive (Log Analytics ingestion is not cheap), and hard to maintain.

What you'd want to monitorWhat Azure Monitor offers natively
"Sales daily.parquet is stale"Storage account metrics — won't catch stale files
"Customer dim row count dropped"None — requires custom KQL with row count history
"Pipeline produced bad data, dashboard now wrong"None — Azure Monitor sees pipeline as 'succeeded'
"Model refresh time deviated from baseline"Possible via custom Log Analytics query, no anomaly detection

What to build (or buy) to close the gap

There are three patterns teams use to extend Azure Monitor for data reliability. Each has trade-offs.

Pattern 1: Custom KQL + Logic Apps Write scheduled KQL queries that compute dataset freshness, row counts, and refresh duration percentiles. Trigger Logic Apps when results breach thresholds. Send to Teams or email.

Pros: Full control, stays inside Azure ecosystem, billable to existing Log Analytics workspace. Cons: Months of work to cover a real stack. KQL maintenance burden grows with every new pipeline. No anomaly detection out of the box. Lineage must be modeled by hand.

Pattern 2: Native Power BI alerting + Azure Function callbacks Use Power BI's built-in alerts on dashboard cards. Hook them to Azure Functions that fan out to the rest of your stack.

Pros: Quick to set up for Power BI-only stacks. Cons: Only works on numeric tile values. No cross-pipeline view. Alerts fire after a human-built dashboard reflects the issue, which means too late.

Pattern 3: Dedicated data observability tool on top Layer a tool like MetricSign on top of Azure Monitor. The tool subscribes to the same Azure logs and metrics, plus PBI Service APIs, plus dbt manifests, plus ADF activities — and reasons about them as a connected pipeline rather than as discrete resources.

Pros: Days, not months, to first useful alert. Cross-tool lineage included. Anomaly detection on freshness and volume without manual thresholds. Cons: Another tool to evaluate and justify. Some overlap with Azure Monitor on the basics.

An IDC report estimates data engineers spend up to 30% of their time on incident triage and root cause analysis. (IDC, 2023) Pattern 1 doesn't reduce that number. Pattern 3 is designed to.

When Azure Monitor alerts are enough

Not every team needs to add a layer. If your situation matches all of these, Azure Monitor on its own is fine:

  • Your stack is mostly compute and infra (VMs, AKS, App Services), with little BI or analytics on top
  • Data delivery is batch, simple, and rarely changes shape
  • You have one source feeding one sink, with no cross-tool lineage to worry about
  • You have an engineer who genuinely enjoys writing and maintaining KQL

If two or more of those statements feel like a stretch, the cost of staying in pure Azure Monitor will catch up with you. Usually as a Tuesday morning incident that took six hours to root cause.

How MetricSign fits next to Azure Monitor

We built MetricSign for the case where Azure Monitor is your infrastructure layer and you need a data layer on top. The two are complementary.

Azure Monitor stays the source of truth for compute, networking, security, and cost. MetricSign reads from Power BI Service, ADF, Databricks, dbt (Cloud and Core), and Fabric — and provides the cross-tool lineage and freshness alerting that Azure Monitor doesn't model.

No replacement of existing alerts. No requirement to migrate anything off Log Analytics. We sit in front of the data layer where Azure Monitor sits in front of the infrastructure layer.

Connect your stack in 15 minutes →

Frequently asked questions

What are Azure Monitor alerts used for?+
Azure Monitor alerts notify you when a metric crosses a threshold, a log query returns matching rows, or an activity log records a specific event. They cover infrastructure (CPU, memory, network), application telemetry (exceptions, response times), cost spikes, and Azure resource changes. They are not designed for data quality, dataset freshness, or cross-pipeline reliability.
Can Azure Monitor alert on Power BI refresh failures?+
Yes, via the Power BI activity log piped into Log Analytics, but not by default. You configure a diagnostic setting on the Power BI tenant or capacity, then write KQL queries on the resulting tables. Native anomaly detection (e.g. 'this refresh took 3x longer than usual') is not provided and must be approximated with custom queries.
What is the difference between Azure Monitor and a data observability tool?+
Azure Monitor watches resources: VMs, app services, storage accounts. A data observability tool watches data: freshness, volume, schema, distribution, and the lineage that connects them. Azure Monitor will tell you a pipeline ran. A data observability tool will tell you the data the pipeline produced is wrong, even if the pipeline succeeded.
How much does Azure Monitor cost for data team use cases?+
It depends on the data ingested into Log Analytics. Pricing as of 2026 is roughly $2.30 per GB ingested (pay-as-you-go) for the standard analytics tier, with discounts at commitment tiers. Data team workloads can run high if you ingest fine-grained Power BI activity logs and detailed ADF run telemetry. Always model the ingestion cost before building deep custom queries.
Should I replace Azure Monitor with a data observability tool?+
No. Use them together. Azure Monitor stays the right tool for infrastructure, security, and cost monitoring. A data observability tool covers the data layer Azure Monitor doesn't natively model. Replacing one with the other usually creates new gaps, not fewer.

Related integrations

Related articles