What Azure Monitor alerts actually do well
Azure Monitor is one of the most-used monitoring stacks on the planet. It collects three streams: metrics (numerical, time-series), logs (text, queryable via KQL in Log Analytics), and traces (Application Insights). On top of those, you build alert rules that fire when a threshold is breached or a log query returns rows.
For infrastructure and application monitoring, this is exactly what you need:
| Use case | Azure Monitor coverage | Typical alert |
|---|---|---|
| VM CPU spike | Metric alert on Percentage CPU | "VM-prod-01 CPU > 90% for 5 min" |
| App exception | Application Insights log alert | "More than 10 errors in 1 min" |
| Pipeline runtime | Log alert on ADFActivityRun | "Activity duration > 30 min" |
| Cost anomaly | Cost Management alert | "Daily spend > $500" |
| Key Vault unauthorised access | Activity Log alert | "Vault accessed by unknown principal" |
When the question is "is this Azure resource healthy?", Azure Monitor alerts answer it well, fast and cheap.
Microsoft's own Azure Monitor docs describe alert rules as "the way to proactively notify you when an important condition is found." Note the framing: a condition you defined, on a signal Azure already collects.
The blind spots that hurt data teams
The trouble starts when the question shifts from "is the resource healthy?" to "is the data correct?". Three categories of failure regularly reach business users without Azure Monitor noticing.
1. Silent data quality issues A pipeline runs successfully. The job exits with code 0. The downstream table is populated. Azure Monitor sees a green tick. But the rows that should have arrived are missing because an upstream API silently changed a field name. There is nothing for Azure Monitor to alert on, because by its definition nothing went wrong.
2. Cross-tool lineage breaks ADF feeds Databricks. Databricks writes to a Lakehouse. Power BI reads from the Lakehouse. Azure Monitor watches each of those independently. When the chain breaks at the Lakehouse-to-Power BI handoff (refresh fails, semantic model recompiles, dataset goes stale), no Azure Monitor alert connects the dots back to ADF or upstream Databricks.
3. Power BI refresh anomalies Azure Monitor can be wired up to Power BI activity logs, but it doesn't natively understand: - A refresh that succeeds but takes 3x longer than baseline - A model that hasn't refreshed in 18 hours when it should every 6 - A capacity that throttled refreshes silently because of CU exhaustion
These are the symptoms data teams care about, and they're invisible to default Azure Monitor alerting.
A 2024 Wakefield Research / Monte Carlo survey found data teams report 70 incidents per 1.000 tables per year on average. (Monte Carlo, 2024) Most of those incidents don't surface in infrastructure monitoring at all.
Why this happens (it's not Azure's fault)
Azure Monitor was designed around resources. A VM, a function app, a SQL database — each is a discrete object with predictable signals. The platform excels at watching those signals and alerting on threshold breaches.
Data reliability is a different shape. The thing you care about is not a single resource. It's a flow of data through many resources, where a failure in one of them produces an effect three layers downstream. The right unit of monitoring is the dataset, not the VM. Azure Monitor doesn't model datasets natively.
You can fight against this by writing complex KQL queries that join ADFActivityRun, DatabricksJobRun, PowerBIDatasetActivity, and AzureDiagnostics tables to approximate a dataset's lifecycle. Many teams do. The result is brittle, expensive (Log Analytics ingestion is not cheap), and hard to maintain.
| What you'd want to monitor | What Azure Monitor offers natively |
|---|---|
| "Sales daily.parquet is stale" | Storage account metrics — won't catch stale files |
| "Customer dim row count dropped" | None — requires custom KQL with row count history |
| "Pipeline produced bad data, dashboard now wrong" | None — Azure Monitor sees pipeline as 'succeeded' |
| "Model refresh time deviated from baseline" | Possible via custom Log Analytics query, no anomaly detection |
What to build (or buy) to close the gap
There are three patterns teams use to extend Azure Monitor for data reliability. Each has trade-offs.
Pattern 1: Custom KQL + Logic Apps Write scheduled KQL queries that compute dataset freshness, row counts, and refresh duration percentiles. Trigger Logic Apps when results breach thresholds. Send to Teams or email.
Pros: Full control, stays inside Azure ecosystem, billable to existing Log Analytics workspace. Cons: Months of work to cover a real stack. KQL maintenance burden grows with every new pipeline. No anomaly detection out of the box. Lineage must be modeled by hand.
Pattern 2: Native Power BI alerting + Azure Function callbacks Use Power BI's built-in alerts on dashboard cards. Hook them to Azure Functions that fan out to the rest of your stack.
Pros: Quick to set up for Power BI-only stacks. Cons: Only works on numeric tile values. No cross-pipeline view. Alerts fire after a human-built dashboard reflects the issue, which means too late.
Pattern 3: Dedicated data observability tool on top Layer a tool like MetricSign on top of Azure Monitor. The tool subscribes to the same Azure logs and metrics, plus PBI Service APIs, plus dbt manifests, plus ADF activities — and reasons about them as a connected pipeline rather than as discrete resources.
Pros: Days, not months, to first useful alert. Cross-tool lineage included. Anomaly detection on freshness and volume without manual thresholds. Cons: Another tool to evaluate and justify. Some overlap with Azure Monitor on the basics.
An IDC report estimates data engineers spend up to 30% of their time on incident triage and root cause analysis. (IDC, 2023) Pattern 1 doesn't reduce that number. Pattern 3 is designed to.
When Azure Monitor alerts are enough
Not every team needs to add a layer. If your situation matches all of these, Azure Monitor on its own is fine:
- Your stack is mostly compute and infra (VMs, AKS, App Services), with little BI or analytics on top
- Data delivery is batch, simple, and rarely changes shape
- You have one source feeding one sink, with no cross-tool lineage to worry about
- You have an engineer who genuinely enjoys writing and maintaining KQL
If two or more of those statements feel like a stretch, the cost of staying in pure Azure Monitor will catch up with you. Usually as a Tuesday morning incident that took six hours to root cause.
How MetricSign fits next to Azure Monitor
We built MetricSign for the case where Azure Monitor is your infrastructure layer and you need a data layer on top. The two are complementary.
Azure Monitor stays the source of truth for compute, networking, security, and cost. MetricSign reads from Power BI Service, ADF, Databricks, dbt (Cloud and Core), and Fabric — and provides the cross-tool lineage and freshness alerting that Azure Monitor doesn't model.
No replacement of existing alerts. No requirement to migrate anything off Log Analytics. We sit in front of the data layer where Azure Monitor sits in front of the infrastructure layer.
