MetricSign
Start free
Data Observability7 min·

dbt Monitoring: What 'Job Succeeded' Doesn't Tell You

Every model in your dbt job ran successfully. The Power BI dashboard is serving last week's numbers. The failure happened between 'model succeeded' and 'data correct'.

Why 'job succeeded' is not the same as 'data correct'

A dbt job completes and the status shows Succeeded. Every model in the run executed without raising an exception. What that means, precisely: the SQL compiled, the queries ran against the warehouse, and no model failed with a fatal error.

What it does not mean: that the output tables contain the correct data, have the expected row counts, include all expected columns, or were produced within the freshness window your downstream BI users expect.

The gap is structural. dbt is a transformation tool — it executes SQL and runs tests you define. It reports on the execution. It does not independently verify that the output matches business expectations unless you wrote tests that check for exactly that. And tests are the part most teams write too few of, too late, and stop maintaining when the data changes.

The most common failure mode: a source table changes a column name or data type. The dbt model that reads from it runs without a compilation error because the column is referenced in a SELECT * or a flexible reference pattern. The model 'succeeds' and writes a result set that is missing the renamed column, has a type mismatch in an aggregation, or silently drops rows due to a JOIN that no longer matches. Everything downstream — including Power BI — loads the result and reports success.

Effective dbt monitoring requires more than watching job status. A data observability platform sits above the dbt run and watches the output: row counts, column schemas, freshness, and the connection between dbt models and downstream BI consumption.

What native dbt monitoring gives you

dbt Cloud and dbt Core both provide run monitoring. dbt Cloud has a job run history, run notifications (email or webhook on job success, warning, or failure), and a model timeline that shows which models ran, which failed, and how long each took. dbt Core produces run artifacts — run_results.json, manifest.json, catalog.json — that can be parsed for the same information.

For dbt Cloud users, the built-in alerts cover: - Job-level pass/fail/warning status - Model-level failure (which model in the job threw a fatal error) - Test failure (which dbt tests failed in this run) - Job duration (rough signal for abnormal slowness)

For dbt Core users, the same information is in the run artifacts, but surfacing it requires tooling to parse and route the output — dbt test exit codes, run_results.json parsing, or an orchestrator that reads job status.

This is a solid foundation. The gaps are:

  1. Test coverage is only as good as the tests you wrote. A model with no dbt tests will always 'pass' its test suite — there is nothing to fail.
  2. dbt tests run at a point in time. They do not watch for freshness lag between runs or anomalies in the volume trend.
  3. dbt monitoring does not see downstream. A dbt Cloud notification about a model failure does not tell you which Power BI datasets are now at risk.

The tests that catch what job status misses

dbt tests are the first line of defence for output correctness, and the most commonly underinvested in. Three categories of tests have the highest leverage:

Schema tests (built-in): not_null, unique, accepted_values, relationships. These run by default if configured and catch the most obvious value-level problems. Every primary key should have not_null and unique. Every foreign key should have relationships. These tests exist in dbt core and require no external packages.

Freshness sources (built-in): dbt source freshness checks whether the upstream source tables have been updated within a configured window. This is the one freshness signal that is native to dbt — it reads the max timestamp of the source table and compares it against a warn_after / error_after threshold. Running dbt source freshness before dbt build surfaces stale source data before the models run.

Row count comparisons (packages or custom): dbt does not natively compare row counts between runs. The dbt_utils package provides expression_is_true which can check that count(*) > 0, but a day-of-week baseline comparison requires a custom test or an observability tool. This is where native dbt monitoring reaches its limit.

The practical approach: start with schema tests on every primary and foreign key, add source freshness checks for every source that has a known refresh schedule, and use an observability tool for row count baselines and cross-model volume anomalies. For teams running dbt with Databricks, this connects directly to the Databricks and Power BI monitoring gap — dbt models are one more layer where silent failures can propagate downstream.

Connecting dbt failures to downstream BI impact

A dbt model failure in isolation is a tractable problem. The model has a name, a status, and an error message. You fix the model and rerun the job.

The harder question is impact: which downstream assets are currently serving wrong data because this model failed? For a team with 40 dbt models and 20 Power BI datasets, answering that manually means checking datasource configurations one by one — a 45-minute investigation on a bad morning.

The manifest.json artifact that dbt produces on every run contains the full DAG of model dependencies. Parsing this gives you forward traversal: starting from the failed model, which downstream models does it feed? And from those models, which tables or views are read by which Power BI datasets?

MetricSign parses the dbt manifest on every run and uses it as the basis for downstream impact assessment. When a dbt model fails, the alert includes: which models are downstream, which Power BI datasets read from the failed model's output, and whether those datasets are scheduled to refresh before the model is repaired. This turns a dbt job failure from a transformation problem into a clear picture of business impact.

dbt monitoring in practice: what a useful alert looks like

Most teams start with job-level dbt Cloud notifications and discover their limitations after the first incident where the job succeeded and the data was wrong. The progression toward effective monitoring typically looks like this:

Level 1 — Job status alerts. Email when the job fails. Catches fatal errors. Misses silent successes with wrong output.

Level 2 — Test failure routing. Route dbt test failures to a dedicated channel with model name and failure type. Requires test coverage to be meaningful — if tests are sparse, this adds little.

Level 3 — Source freshness checks. Run dbt source freshness before dbt build in CI/CD and alert if sources are stale. Prevents running transformations on old data.

Level 4 — Row count baselines and anomaly detection. Compare model output row counts against a rolling baseline after each run. Catches the zero-row write, the partial load, the filter that silently removed most of the data.

Level 5 — Cross-stack lineage. Connect dbt model failures to downstream Power BI datasets, Databricks jobs, or Snowflake views that depend on them. This is the level where dbt monitoring becomes part of a broader data observability platform rather than an isolated tool.

Teams running dbt with Snowflake as the warehouse often need monitoring that spans both layers — for that combination, see dbt and Snowflake monitoring.

Frequently asked questions

What does dbt monitoring actually cover?+
dbt Cloud provides job-level pass/fail status, model-level failure detail, and test failure notifications. dbt Core produces run artifacts (run_results.json, manifest.json) that contain the same information. Both tell you whether models executed without fatal errors and which dbt tests failed — not whether the output data is correct or fresh.
What is dbt source freshness monitoring?+
dbt source freshness runs a query against each configured source table to check when data was last written (using a loaded_at_field). If the gap between the latest record and the current time exceeds the configured warn_after or error_after threshold, the check fails. Running this before dbt build prevents transforming stale source data.
Why can a dbt model succeed but produce wrong data?+
dbt executes SQL and reports on execution status. If the SQL runs without a fatal error, the model status is 'succeeded'. A SELECT * that silently drops a renamed column, a JOIN that matches no rows due to a key mismatch, or a filter that removes most records will all produce a 'succeeded' status with incorrect output — unless a dbt test specifically checks for the failure.
How do you detect row count anomalies in dbt models?+
The dbt_utils package provides expression_is_true, which can check that a row count is greater than zero. For day-of-week baseline comparisons, a data observability tool that reads the model output after each run and compares against a rolling historical baseline is more reliable than static thresholds.
How does dbt monitoring connect to Power BI?+
dbt's manifest.json contains the full DAG of model dependencies. A monitoring platform that parses this manifest can map from a failed dbt model to the downstream tables it feeds, and from those tables to the Power BI datasets that read from them. This turns a dbt failure into an actionable impact statement — not just 'model failed' but 'these three Power BI datasets are now at risk'.

Related integrations

Related articles