Stale data in Power BI is one of the hardest problems to detect because the refresh API reports success regardless of whether new data was actually loaded. A dataset can be completely current from Power BI's perspective while serving data that's days old.
Understanding the watermark approach
A watermark is the maximum value of a date field that represents "how recent is this data." For a transaction table, it's the maximum transaction date. For a log table, it's the maximum event timestamp. For a slowly changing dimension, it might be the maximum last_modified date.
After each refresh, the monitoring system queries the dataset for this maximum value (using the Power BI DAX query API) and compares it to the expected value based on the schedule:
- Daily refresh at 06:00: the max transaction date should be yesterday (or today if it's a live dataset)
- Hourly refresh: the max event timestamp should be within the last 2 hours
- Weekly refresh: the max date should be within the past week
If the maximum date hasn't advanced since the previous refresh, the new data wasn't loaded — even though the refresh completed without error.
Common causes of stale data
Failed ETL upstream: An ADF pipeline or dbt job failed silently or produced no new output. The source table still has yesterday's data. Power BI refreshes it successfully.
Source system didn't export: The source application failed to generate its daily export file. The staging table has no new rows for today. ADF copied zero rows and reported success.
Incremental refresh partition error: A Power BI dataset using incremental refresh had a misconfigured policy that reloaded an old partition instead of the current one. The refresh completed, the row count is similar, but the data is from a previous period.
Timezone issue: The ETL runs in UTC but loads data using a local time filter. A timezone configuration change causes the filter to exclude today's records.
Implementing watermark monitoring
Watermark monitoring requires knowing which column represents "freshness" for each dataset. This varies by dataset and typically requires configuration rather than automatic detection.
For datasets with a standard naming convention (tables with event_date, transaction_date, or modified_at columns), monitoring can auto-discover watermark candidates. For others, explicit configuration is needed.
The expected freshness window also varies: a dataset refreshing hourly should have a tighter tolerance than a weekly refresh. The monitoring system needs to know both the schedule and the expected lag to correctly classify a watermark as stale vs. acceptably old.