Why dbt fails in production when it works locally
dbt production run failures fall into a small number of categories: the service account lacks the right database permissions, the SQL works locally but fails in the warehouse dialect, upstream models failed and cascaded skips to downstream models, credentials in the production profile are wrong or expired, or an incremental model has drifted from the current schema in the source.
dbt failures cascade. When one model fails, dbt marks it as an error and marks every downstream model as skipped — not failed. A run that shows twenty skipped models and one error has one root problem: fix the upstream model and the skipped models will run. Treating each skipped model as a separate problem is a waste of time.
dbt failure output is also structured. The error type (Database Error, Compilation Error, Runtime Error, Dependency Error) tells you which layer the problem is at before you read the message. Knowing the error type narrows the fix space before you read the full error.
Permission denied on deployment: the most common production failure
The most frequently encountered production dbt failure — with nearly five thousand Stack Overflow views — is permission denied when deploying to a production schema. The dbt service account works in development (where permissions are broader) but fails in production because the CREATE TABLE or CREATE VIEW grants were never explicitly set on the production schema.
For Snowflake, the minimum permission set for a production service account:
GRANT USAGE ON DATABASE analytics_db TO ROLE dbt_prod_role; GRANT USAGE ON SCHEMA analytics_db.prod TO ROLE dbt_prod_role; GRANT CREATE TABLE ON SCHEMA analytics_db.prod TO ROLE dbt_prod_role; GRANT CREATE VIEW ON SCHEMA analytics_db.prod TO ROLE dbt_prod_role; GRANT SELECT ON ALL TABLES IN SCHEMA analytics_db.staging TO ROLE dbt_prod_role;
To check what a role currently has: SHOW GRANTS TO ROLE dbt_prod_role;
This needs to be a one-time setup step in the production environment, not something dbt manages itself. Organizations that skip this step hit the permissions wall the first time they run the production deployment.
SQL compilation and dialect errors
dbt compiles model SQL before sending it to the warehouse. Compilation errors catch structural problems — broken Jinja syntax, unresolvable ref() calls, circular dependencies. But the more insidious class is SQL that compiles successfully and then fails at execution because the warehouse rejects it.
This happens when engineers develop locally against DuckDB or PostgreSQL and deploy to Snowflake or BigQuery. Functions like DATEADD (Snowflake) versus DATE_ADD (BigQuery), ILIKE versus LIKE, and array syntax all differ between dialects. The SQL is syntactically valid in dbt's compilation model but invalid in the production warehouse.
The fastest way to diagnose a compilation or execution error is to look at the compiled SQL directly:
cat target/compiled/my_project/models/my_failing_model.sql
Paste this into the warehouse query editor and run it. The exact warehouse error will surface — with line numbers and context that dbt's error output sometimes obscures. This is faster than reading dbt logs and gives you the raw feedback from the warehouse.
Broken ref() chains and the skipped model cascade
A Dependency Error fires when a ref() points to a model that does not exist in the project, has been renamed, or is located in a different package that is not installed. This blocks not just the referencing model but everything downstream of it.
For debugging, isolate the failing model first:
dbt run --select my_failing_model
Then include its upstream dependencies:
dbt run --select +my_failing_model
The + prefix tells dbt to run the model and all of its upstream dependencies. This is the correct way to rerun a specific model after fixing it without re-running the entire project.
For Dependency Errors specifically, dbt compile will catch them before any SQL runs:
dbt compile --select +my_failing_model
If compile succeeds but the run fails, the problem is a warehouse-side error (permissions, SQL syntax) not a dbt-side dependency issue.
Credential and connection failures
Runtime Errors that begin with Could not find profile or Failed to connect indicate a problem in profiles.yml, not in the models or SQL. The two most common causes: the profile name in profiles.yml does not exactly match the profile specified in dbt_project.yml, or the production credentials (service account key, OAuth token, warehouse endpoint) are wrong or expired.
For the profile name mismatch:
# dbt_project.yml says: profile: my_project
# profiles.yml must have exactly: my_project: # not my_project_prod, not MyProject target: prod outputs: prod: ...
A one-character difference in the profile name produces a Could not find profile error that looks like a configuration problem but is actually a typo.
The first diagnostic step for any connection failure is dbt debug. It checks the profile, target, warehouse connectivity, and dbt version in one command and reports which specific check failed. This takes less than ten seconds and rules out the most common causes before you start looking at SQL.
Incremental model schema drift
Incremental models build on top of existing tables — only new rows are appended on each run. This efficiency comes with a fragility: when the source schema changes (a new column appears, an existing column is renamed or changes type), the incremental model's INSERT statement may no longer match the target table's structure.
The symptom is a Database Error on an incremental model that was previously working. The error message mentions a column name mismatch or type incompatibility.
The fix is a full refresh, which drops and rebuilds the table from the current schema:
dbt run --full-refresh --select my_incremental_model
This is the correct tool for schema drift. It solves the immediate problem but does not prevent the next one. For production incremental models, adding a contract (column definitions in the model's .yml file) catches schema changes at compile time rather than at runtime:
models: - name: my_incremental_model config: contract: enforced: true columns: - name: event_id data_type: varchar
With contract enforcement, dbt will fail at compile if the source schema no longer matches the contract — before any SQL runs.
Developer credentials in production pipelines
One of the most reliable ways to get a dbt Cloud production job failure at the worst possible moment is to configure it with personal developer credentials instead of a service account. When the developer changes their password, rotates their API key, or deactivates their account, the production job fails immediately.
dbt Cloud jobs should always use a service account credential tied to a non-personal identity — a dedicated dbt Cloud service account, a machine user, or an OAuth identity that is not linked to a specific person's employment status.
For dbt Core pipelines, the same rule applies to profiles.yml in CI/CD: environment variables for credentials ({{ env_var('SNOWFLAKE_ACCOUNT') }}), not hardcoded values or personal tokens. Hardcoded personal credentials in CI/CD pipelines are also a security exposure, not just a reliability risk.
MetricSign: run step detail at incident time
MetricSign integrates with dbt Cloud via the dbt Cloud API. When a job fails, MetricSign creates a job_failed incident and extracts the run step details — the specific model that failed, the error type, and the error message — and surfaces them as a root_cause_hint on the incident.
Instead of logging into dbt Cloud, finding the run, expanding the failed step, and reading the error, you see something like: dim_customers: Database Error — column 'email' does not exist in the incident notification. The investigation starts at the right model immediately.
MetricSign also links dbt lineage (from manifest.json) to the downstream Power BI datasets that depend on the failed model. If a dbt model that feeds a dataset fails, the affected reports are visible in the incident context — before anyone opens Power BI and discovers stale data.