MetricSign
Request Access
Data Observability8 min·

Power BI Monitoring: What to Track Beyond Refreshes

Most teams only monitor refresh failures. Here's what else can silently break your dashboards.

Why Refresh Status Is Not Enough

Most Power BI monitoring setups look the same: configure a failure alert, get an email when a refresh fails, call it done. For small teams with simple models, this works. The refresh either completes or it doesn't, and you know about it within the hour.

But as your data stack grows — more datasets, more reports, more dependencies on upstream pipelines — refresh status becomes an increasingly incomplete picture. A refresh can complete successfully while delivering data that's two days old. A dataset can pass every refresh check while quietly missing an entire dimension table because a column was renamed upstream.

The gap between "refresh succeeded" and "data is correct and fresh" is where most data quality incidents live. Unlike a hard failure, these silent problems don't announce themselves. They surface when a stakeholder notices something odd in a report, or worse, when a business decision has already been made on bad numbers.

Schema Changes: When Columns Vanish

A schema change is one of the most disruptive things that can happen to a Power BI dataset, and it rarely comes with a warning. When a data engineer renames a column in a SQL Server table, drops a view, or changes a data type, the downstream Power BI model may still refresh successfully — but something is broken.

The most common scenario: a column that a Power Query step depends on gets renamed in the source. Depending on how the query is structured, the column might silently return null values or the step might error with a cryptic Expression.Error: The column 'CustomerID' of the table wasn't found.

The harder-to-catch variant is when a new column appears upstream that should be in the model but isn't. Your dataset refreshes fine, but the model is missing data analysts expect to be there. Without schema change detection, you won't know until someone asks "where did the discount field go?"

Detecting schema changes requires comparing the set of columns returned by each query against a baseline captured at the last known-good state. A good monitoring system alerts the moment a column appears, disappears, or changes type — before anyone runs a report off stale structure.

Volume Anomalies: When Numbers Are Quietly Wrong

Volume monitoring answers a simple question: does this dataset have approximately the right number of rows? If your sales fact table normally has 50,000 rows after a daily refresh and today it has 12,000, something went wrong upstream — but your refresh completed without errors.

This happens more often than you'd expect. Common causes include:

  • An ADF pipeline that copied an empty file because the source system didn't export on time
  • A filtered query that accidentally excluded most rows because a date parameter wasn't updated
  • A soft delete in the source database that wiped a partition
  • A staging table truncated but not reloaded before the Power BI refresh ran

Volume monitoring works by establishing a baseline — typically a rolling average of row counts over the past N refreshes — and alerting when the current count deviates significantly. A 15% drop might be normal fluctuation on weekends; a 70% drop almost never is.

The challenge is calibrating per dataset. A small lookup table with 200 rows shouldn't trigger on a 10-row change. A 50-million-row fact table warrants a tighter percentage band. Good volume monitoring adapts thresholds to each dataset's history rather than applying a single rule across the board.

Schedule Drift: When Refreshes Start Slipping

Schedule drift is subtle. Your dataset is configured to refresh at 06:00. On Monday it finishes at 06:14. Tuesday, 06:22. By Friday it's running until 07:15, and your morning report has been late for five days — but nobody configured an alert for this because the refreshes are still technically succeeding.

Drift happens for several reasons: the upstream data source is growing and queries take longer, gateway resources are shared with more concurrent jobs, or a slowly-growing model is approaching memory limits during processing. None of these trigger a failure. They just make everything slower.

From a business impact perspective, schedule drift can be as damaging as an outage. If a Power BI report that powers a 09:00 standup is still refreshing at 08:50, the meeting starts without current data. If a financial close report that should run at midnight consistently finishes at 04:00, the window for review before market open shrinks.

Detecting drift requires tracking actual completion times over multiple refreshes and alerting when the average or worst-case duration exceeds a threshold — not just monitoring individual slow refreshes, but tracking the trend.

Stale Data: When Success Means Nothing

The sneakiest failure mode: the refresh succeeds, the row count looks right, the schema hasn't changed — but the data is from two days ago. This happens when the source system fails silently and serves cached or stale data to the query layer.

A concrete example: an on-premises SQL Server has a nightly ETL job that populates a staging table. The ETL failed last night, so the staging table still has yesterday's data. Power BI refreshes against that table, loads the rows successfully, and reports "refresh complete". Everything looks fine downstream until someone notices the max transaction date in a report is stuck.

Stale data detection requires monitoring a watermark column — typically a modified_at, created_at, or transaction_date field — and alerting when the maximum value hasn't advanced since the last refresh. This is specific to each dataset because different tables use different date patterns.

For Power BI specifically, this means inspecting the actual data values in the loaded model, not just whether the refresh completed. Most monitoring tools stop at the API layer and never look at the data itself.

Five Signals, One Complete Picture

Refresh status, schema changes, volume anomalies, schedule drift, and stale data aren't five separate monitoring problems — they're five signals that together give you a complete picture of dataset health. A dataset that passes all five checks is genuinely healthy. A dataset that passes only the first might be serving wrong data to hundreds of users.

The practical challenge is that each signal requires different data: API calls for refresh status, query-level metadata comparison for schema changes, row count comparisons for volume, timestamp analysis for schedule drift, and watermark inspection for staleness. Assembling this into a single monitoring view requires integrating multiple data sources.

For most teams, the highest-value signals to add first are volume monitoring and stale data detection — these catch the silent failures with the most business impact. Schema change detection is critical for teams with frequent upstream changes. Schedule drift monitoring matters most when reports have hard SLA windows: board reports, financial closes, daily operational dashboards.

The goal isn't to add complexity. It's to close the gap between "the refresh succeeded" and "the data is correct."

Related error codes

Related integrations

Related articles

← All articles