MetricSign
Start free
Data Observability7 min·

Databricks and Power BI Monitoring: The Failures That Stay Hidden Between the Two

The Databricks job ran. Power BI refreshed. The sales dashboard is showing yesterday's empty rows. The failure happened in between, and neither tool noticed.

What Power BI cannot see about Databricks

Power BI connects to Databricks via a JDBC/ODBC connection or through Azure Data Lake Storage (ADLS) as a pass-through. In either case, Power BI reads from a Delta table — it sees a table with rows and columns. It does not see how many rows were written by the last job, whether the schema changed in the last run, or whether the Databricks job that produced the data ran late.

This is the monitoring gap. Power BI's refresh API reports success when the import completes without a connection error. If the Delta table exists and returns data, the refresh succeeds. What the data contains — or how recently it was written — is invisible at the Power BI layer.

The failures that stay hidden:

Zero-row writes. A Databricks job encounters a schema mismatch or applies an over-restrictive filter and writes zero rows to the Delta table. The MERGE or INSERT statement executes successfully. The job status is SUCCEEDED. Power BI refreshes against the now-empty Delta table and loads nothing. The dataset reports success with zero rows.

Schema drift. An upstream source adds or renames a column. The Databricks transformation job silently drops the missing column (depending on the schema enforcement policy on the Delta table). Power BI loads data without the column. Measures and visuals that reference it show blank or zero.

Slow run anomalies. A Databricks job that normally takes 12 minutes starts taking 45 minutes due to data skew, cluster configuration changes, or upstream data volume growth. It still completes before the Power BI refresh runs — so no failure triggers. But over weeks the latency accumulates until it crosses the SLA.

A data observability platform that monitors both Databricks and Power BI closes this gap by correlating what the job wrote with what the dataset loaded.

Job failures that look like success from the outside

Databricks has its own job monitoring — the Jobs UI, run history, and webhook notifications. These tell you when a job fails with an exception. They do not tell you whether the job produced correct output.

Three categories of 'silent success' in Databricks:

Schema enforcement policies. Delta tables can be configured with PERMISSIVE, DROPMALFORMED, or FAILFAST modes. A table set to PERMISSIVE or DROPMALFORMED will accept partial data and silently discard non-conforming rows. The job shows success; the table is missing records.

Merge write semantics. A MERGE INTO statement that matches no rows writes zero rows without error. If the job logic contains a date filter that is off by one day, or a join key that mismatches after an upstream rename, the merge executes and reports success with zero rows inserted or updated.

Streaming checkpoints. A Databricks Structured Streaming job can process a batch of zero records because the upstream Kafka or Event Hub partition is empty or behind. The job reports a successful micro-batch with zero rows processed. Downstream Delta tables do not update.

None of these register as failures in Databricks. All of them leave Power BI with data that is either empty, incomplete, or stale. The only way to detect them is to instrument the output: count rows written, compare against a baseline, check whether the schema changed between runs. MetricSign does this for Databricks by parsing job run metadata and comparing output metrics against a rolling baseline.

Connecting Databricks failures to Power BI impact

When a Databricks job writes partial data, the immediate question is not just 'the job failed' — it is 'which Power BI reports are currently showing that data to users?'

Answering this requires lineage: the map from Databricks Delta table to Power BI dataset to report. Without it, a Databricks alert triggers a manual investigation: which datasets read from this table? Which reports depend on those datasets? For a data platform with 30 Delta tables feeding 15 Power BI datasets, that is a 45-minute lookup.

With lineage, the alert fires with context: 'Databricks job daily_sales_load wrote 0 rows at 03:14. Downstream Power BI datasets Sales Executive, Revenue by Region are scheduled to refresh at 06:00. Refreshes paused pending resolution.' The data team has both the failure and the impact scope in one message.

This is the value of cross-stack monitoring. MetricSign connects the Databricks job layer to the Power BI dataset layer via a shared lineage graph, so a job anomaly immediately surfaces its downstream risk. For teams running dbt alongside Databricks, see dbt monitoring for how the transformation layer adds another node to this chain.

What to actually monitor in Databricks

The right set of signals for Databricks monitoring is narrower than it appears:

Job completion status — The baseline. Did the job complete within the expected window? A job that runs 3× its historical median is a leading indicator of a problem even if it eventually succeeds.

Rows written per run — The most valuable output metric. Compare the current write count against a rolling day-of-week baseline. A delta load that normally writes 80,000 rows and writes 200 rows is a problem regardless of status.

Schema of the output Delta table — Compare column names and types after each job run. Any column that was present in the previous run and is absent in the current run is a schema change worth alerting on.

Cluster startup time — Unusually long cluster startup delays are an operational signal that the downstream jobs in the cluster will all run late. Catching this before the business impact is visible requires watching startup latency as a trend.

Downstream dataset freshness — After the Databricks job completes, what is the watermark in the Delta table? Is it what the schedule implies? This cross-checks the job output against the expected data timestamp, not just the job execution time.

MetricSign monitors these five signals for each Databricks workspace connected to it, correlating anomalies with the downstream Power BI datasets in the same incident graph.

Schema drift: the failure Databricks and Power BI both miss

Schema drift deserves its own attention because it is the failure mode where both Databricks and Power BI appear healthy while the data is wrong.

The sequence: an upstream data source renames a column — say, gross_margin_pct becomes margin_pct. The Databricks job reads from the source and applies a transformation. If the Delta table has column mapping enabled and schema evolution is set to mergeSchema, the new column is added and the old one disappears. The job succeeds. The Power BI dataset refreshes and loads the table — with gross_margin_pct now absent. The measure [Gross Margin %] built on that column evaluates to blank. The visuals show zero.

Neither tool flagged this as a failure. Databricks processed the schema change per its configured policy. Power BI loaded the data it found. The drift propagated silently through the entire chain.

Detecting schema drift requires comparing the column set of the Delta table before and after each job run, and alerting when any column that was present in the previous run is absent in the current one. This check runs in seconds and catches every column-level schema change — not just the ones that produce a Databricks exception.

For teams monitoring data quality in Databricks as part of a broader observability strategy, this connects directly to the data observability vs data quality distinction: schema drift detection is an observability signal, not a quality rule.

Frequently asked questions

Why doesn't Power BI detect Databricks job failures?+
Power BI reads from a Delta table via a connection. It sees rows and columns. It does not monitor the Databricks job that wrote them. If the job wrote zero rows or a wrong schema, Power BI loads whatever is there and reports success. Detecting job-level failures requires monitoring at the Databricks layer, not at the Power BI layer.
What is a zero-row write in Databricks?+
A zero-row write occurs when a Databricks job executes successfully but writes no data to its output table — because a filter matched no rows, a merge statement found no matching keys, or schema enforcement silently dropped all incoming rows. The job status is Succeeded; the output table is empty or unchanged.
How do you detect schema drift in a Databricks Delta table?+
Compare the column names and types of the Delta table before and after each job run. Any column present before but absent after is a schema change. This check is a metadata read — it does not require scanning the full table — and can be automated against the Delta table schema history.
How does Databricks monitoring connect to Power BI monitoring?+
Lineage is the connection. A monitoring platform that maps from Databricks Delta tables to Power BI datasets can take a Databricks job anomaly and immediately identify which Power BI datasets are at risk. Without lineage, a Databricks alert triggers a separate, manual investigation to establish downstream impact.
What Databricks monitoring signals matter most for data quality?+
Rows written per run (compared against a day-of-week baseline), output schema comparison between runs, and job duration relative to historical median. These three signals catch zero-row writes, schema drift, and performance degradation — the three failure modes that most often produce wrong data in downstream Power BI reports.

Related integrations

Related articles