Refresh success is a prerequisite, not a health signal
Power BI monitoring usually stops at refresh success or failure. A data observability tool — whether you build one or buy one — watches several signals at once, because the failures that hurt users rarely show up as a failed refresh.
Most Power BI monitoring setups are built around a single event: the refresh either fails or it doesn't. Configure a failure alert, get an email when something breaks, call it monitoring. For a small environment with two or three simple import-mode datasets, this approach holds. Scale past a dozen datasets fed by multiple upstream pipelines, and the gaps become expensive.
A refresh that succeeds means one thing: Power BI loaded data from the configured data source without an API error. It says nothing about whether the data is correct, complete, or current. The upstream pipeline can run and copy zero rows. A source schema change can silently drop a column that your measures depend on. The dataset can load data that was already twelve hours old when the refresh ran. All of these return a green status in Power BI Service.
Data observability platforms extend this view beyond the refresh API — tracking freshness, volume, and schema correctness across the entire chain from pipeline to report.
There are four signals that tell you whether your data is actually healthy. Refresh status is only the entry point.
Schema changes break reports without raising an error
When a data engineer renames a column in a SQL Server table or drops a view in Azure Synapse, Power BI's behavior depends on whether the missing field is referenced in a calculated column or measure. If it is, the model refresh fails with an expression error. If it isn't directly referenced — the column just disappears from the raw table — the dataset reloads without error, the column is silently absent, and any report visual built on it goes blank or zero.
The detection pattern is straightforward: after every refresh, compare the column set in the loaded dataset against a known-good baseline. Any column present in the baseline but absent from the current load is a schema change. This check runs in seconds against the Power BI REST API and catches the failure class that expression errors don't.
Schema changes originating in dbt models are a common source of this problem. A dbt developer renames a field in a staging model. The downstream Power BI dataset refreshes successfully. Three reports that reference that field show zeroes. The first person to notice is a sales manager opening a Monday morning forecast.
Volume anomalies catch the number that shouldn't be zero
Volume monitoring answers one question: does this dataset have approximately the right number of rows? The question is simple. The answer prevents an entire category of silent failure.
Your sales fact table normally loads 48,000–52,000 rows after the daily refresh. If today it loads 12,000 rows, something is wrong upstream — a filter was added to the pipeline, a partition failed to load, a source table was truncated. The Power BI refresh completed successfully. Nothing in the API log shows a problem. The only signal is the row count.
Volume baselines need calibration: day-of-week variation matters (Monday loads are typically larger), and seasonal patterns affect some datasets. A naive threshold that fires whenever the row count drops 20% from the prior day will generate false positives every Monday morning. The right baseline compares current load to the median for the same day of week over the trailing 4-6 weeks. That eliminates normal variation while catching genuine drops.
The threshold doesn't need to be exact to be useful. A drop to 25% of the expected row count almost always indicates a real problem regardless of baseline precision.
Schedule drift is invisible until the SLA is already blown
Your dataset is configured to refresh at 06:00. The refresh completes at 06:14 on Monday. By Wednesday it finishes at 06:47. By Friday it runs until 07:15, and the report your finance team opens at 07:00 has been showing stale data for three days. No alert has fired. Every refresh succeeded.
Schedule drift happens when refresh duration grows gradually — model complexity increases, data volumes grow, upstream queries slow down as tables get larger. The failure is invisible because it's incremental. No single day crosses an obvious threshold. The drift accumulates until it crosses the business SLA.
Detecting drift requires tracking refresh duration over time and alerting on trend, not just on individual values. A single refresh that takes 75 minutes isn't an anomaly if previous refreshes averaged 70 minutes. A refresh that takes 75 minutes when the six-week average was 15 minutes is. The signal is deviation from historical baseline, not deviation from an arbitrary maximum.
Stale data: when the source fails and the refresh doesn't notice
The most difficult failure mode to detect: the refresh succeeds, the row count looks correct, the schema is unchanged — but the data is from two days ago. This happens when the source system stops updating while the pipeline continues running. The pipeline copies the same data it copied yesterday. The refresh loads it without error. The timestamp column that would reveal the problem is hidden in the raw table that nobody has looked at.
Detecting stale data requires a watermark check: read the maximum value of the date or timestamp column that should reflect the most recent data, and compare it against the current time. If your daily sales dataset refreshes at 06:00 and the maximum transaction date is yesterday at noon, the data is already eighteen hours old before your users open the report.
The watermark check adds one query to your monitoring stack and catches the failure class that all other signals miss. Combined with refresh status, schema change detection, and volume monitoring, it closes the four most common paths to silent data failure.
Four signals give you a complete picture. Refresh status alone gives you one quarter of it.
Refresh status, schema change detection, volume monitoring, and watermark checks address different failure modes, and they can each fire independently. A refresh can succeed (status green) while loading a reduced row count (volume anomaly). A schema change can pass undetected while the watermark looks healthy. Stale data can coexist with a perfect row count.
The four checks are lightweight to implement separately. Combined, they give you the complete picture of whether a dataset is healthy — not just whether the loader ran without an error. For most Power BI environments, implementing these four checks against the REST API and a configured data source is a day of engineering work. The alternative is discovering problems from your users.
