Practice 1: Monitor More Than Refresh Status
Power BI's built-in notifications tell you when a refresh fails. They don't tell you when a refresh succeeds but loads wrong data. Start by adding three checks to your monitoring stack.
Row count comparison: After each refresh, compare the loaded row count against the previous refresh. Alert on drops greater than a percentage threshold. What that threshold should be depends on your dataset — 10% is a reasonable starting point for large fact tables. Lookup tables warrant a tighter band.
Watermark check: For any dataset that loads incremental data, monitor the maximum value of the most recent date field. If a refresh completes but the max date hasn't advanced, the new data is stale even if the refresh technically succeeded.
Refresh duration trend: Track how long each refresh takes over time. A refresh that used to take 8 minutes and now takes 45 is heading toward a timeout. Catch the trend before it becomes a failure.
The implementation doesn't need to be complex. A script that queries the Power BI REST API after each refresh, compares the result against the previous run, and writes a flag to a monitoring table is a working starting point. You can build from there.
Practice 2: Track Schema Changes
Schema changes are one of the fastest ways to break a Power BI report, and they almost always originate upstream — in a SQL Server database, a dbt model, an Azure Synapse view, or a SharePoint list. By the time the change reaches Power BI, the model refresh might already have loaded incorrect data.
Set up schema change detection by:
- Capturing a baseline of the columns and data types returned by each data source query at the last known-good state
- After each refresh, comparing the actual columns against the baseline
- Alerting immediately if a column disappears, is renamed, or changes type
- Also alerting if a new column appears that isn't yet mapped in the model — a missed opportunity rather than a failure, but worth knowing
For teams using dbt Cloud or dbt Core, the manifest file is a valuable upstream signal. The manifest documents every column in every model, including descriptions and types. When the manifest changes — because a dbt model was modified — you can diff before and after to understand what changed upstream before the Power BI refresh even runs.
Catching schema drift before a refresh runs is better than catching it after. A lineage-aware monitoring system can alert: "the dbt model that feeds this dataset changed its schema — review before the scheduled refresh at 06:00."
Practice 3: Set Volume Baselines Per Dataset
Volume baselines answer a fundamental question: does this dataset have approximately the right amount of data? Getting this right requires thinking about what "right" means for each specific dataset — not applying a single rule across your entire Power BI environment.
For a daily transaction fact table, the right volume might be "within 20% of the average row count for this day of week." Weekends have different volumes than weekdays. Holiday periods are different from normal weeks. A good baseline accounts for these patterns by using a day-of-week-aware rolling average rather than a simple 7-day mean.
For a slowly-changing dimension table, the right volume is "should never decrease by more than a handful of rows, and should grow steadily over time." A sudden drop in a dimension table is almost always a data quality problem.
For a snapshot table that's fully reloaded on each refresh, the right volume is "should be within a tight band of the last refresh." If a full-reload snapshot suddenly loses 30% of rows, the source data changed significantly or a query filter was accidentally misconfigured.
Start simple: track row counts per dataset per refresh, compute a rolling 7-day average, alert on deviations greater than 15%. Tune the thresholds over time as you learn each dataset's natural variation. Early false positives are valuable — each one teaches you something about how your data actually behaves.
Practice 4: Monitor End-to-End Latency
Power BI users care about one thing: is the data current when they open the report? The answer depends on the entire chain — when the source system updated, how long the pipeline took, how long Power BI took to refresh. Monitoring each link in isolation doesn't tell you whether the end-to-end SLA was met.
End-to-end latency monitoring requires tracking:
- When the source data was last updated (the data watermark in the source system)
- When the pipeline completed (ADF run completion time)
- When the Power BI refresh completed
- The total delta from source update to data-available-in-report
For most teams, this doesn't need to be a real-time system. A daily check confirming "this dataset was refreshed within 2 hours of source data being available" is a significant improvement over assuming things are fine.
The chains with the highest latency risk are those with multiple hops: source → ADF → Azure SQL → Synapse Analytics → Power BI dataset → report. Each hop adds processing time and a potential delay point. When the end-to-end takes 4 hours and your refresh window is at 06:00 for an 08:00 report, you have no buffer. Any slowdown anywhere in the chain breaks the SLA.
Latency monitoring also reveals optimization opportunities. If your end-to-end takes 3 hours but 80% of that is Power BI processing time, adding resources to the Power BI capacity has a bigger impact than optimizing the ADF pipeline.
Practice 5: Build an Incident Response Playbook
Practices 1–4 reduce the frequency and severity of data incidents. Practice 5 is for when they still happen: have a documented process ready before the 06:45 Monday morning phone call, not during it.
A minimal playbook covers five things: the threshold for escalating vs. investigating quietly, a triage checklist for the first diagnostic steps, which reports are affected and who needs to know, remediation options with realistic time estimates, and communication templates for different stakeholder groups. For the full step-by-step framework — from detection through postmortem — see How to Set Up Incident Response for Data Pipeline Failures.