MetricSign
Start free
Data Observability6 min·

Databricks Power BI Connector: What the Connection Doesn't Monitor

The Databricks Power BI connector is connected. Refreshes are running. But which Databricks job last wrote to that Delta table, and was the output correct?

What the connector does — and what it doesn't

The Databricks Power BI connector (available via Power BI Desktop, Power BI Service, and Partner Connect) establishes a data source connection from Power BI to a Databricks workspace. It supports both Import and DirectQuery modes. In Import mode, Power BI copies data from Delta tables on each scheduled refresh. In DirectQuery mode, Power BI sends live queries to Databricks on each visual interaction.

The connector is a transport mechanism. It answers one question reliably: can Power BI read from this Delta table right now? It provides no information about: - Whether the Databricks job that populates the Delta table ran successfully - Whether the job ran on schedule or arrived late - Whether the data in the table is the correct volume for this point in time - Whether the schema of the table has changed since the last time Power BI read from it

This distinction matters in practice. A team that sets up the Databricks connector and schedules Power BI refreshes has connectivity. They do not have monitoring. The connector will import whatever is in the Delta table at the time of the refresh — correct data, stale data, partial data, or zero rows — with equal success.

Monitoring the full chain requires a layer above both tools. A data observability platform that connects to both Databricks and Power BI can watch the job layer and the consumption layer simultaneously.

Import mode vs DirectQuery: different failure surfaces

The choice between Import and DirectQuery affects where failures show up and what monitoring you need.

Import mode copies data from Databricks to Power BI's in-memory model on a schedule. The refresh has a defined completion time, and the data age is predictable — it reflects the state of the Delta table at the moment the import ran. If the Databricks job that writes the Delta table failed or ran late, the Power BI dataset imports stale data from the previous successful run. The refresh completes successfully. Users see data from the previous cycle.

The monitoring requirement in Import mode: know when the Databricks job last wrote to the Delta table, and whether the Power BI import picked up current data or stale data.

DirectQuery mode pushes queries from Power BI to Databricks in real time. There is no refresh cycle — every visual interaction triggers a live query. This means the data is always current relative to the Delta table. It also means that if the Databricks job that writes the Delta table is running slowly, queries to that table from Power BI will be slow or will time out.

The monitoring requirement in DirectQuery mode: watch Databricks job latency and cluster performance, because those directly affect the query response time that Power BI users experience.

For most teams using Import mode — which is the more common pattern for scheduled BI reporting — the monitoring gap is: did the Databricks job write the expected data before the Power BI refresh ran? This requires watching both tools, not just the connector between them. See Databricks and Power BI monitoring for the specific signals to watch in Databricks.

The silent failure scenario: import runs on empty

Here is the failure pattern that plays out most often in teams that have the Databricks connector set up but no monitoring layer above it.

A Databricks job processes a Delta merge — updating the fact_sales Delta table with today's transactions. Due to a source schema change, the merge statement finds no matching rows and executes successfully with zero rows inserted. The Delta table's watermark column still shows the previous successful run.

At 06:00, the Power BI Import mode refresh runs. It reads from fact_sales. The data in the table is from yesterday. The import completes successfully — there was data to import, the connection worked, the query returned results. The Power BI dataset refresh shows Succeeded.

At 09:00, the sales team opens the executive dashboard. The revenue numbers are from yesterday. They escalate to the data team.

The engineer checks Power BI: refresh succeeded. Checks the Databricks job history: job succeeded. Looks at the fact_sales Delta table: the table has data. It takes 25 minutes to trace back to the merge statement output and realize the zero-row write happened overnight.

With a monitoring layer that watches both Databricks output and Power BI import, the zero-row merge would have fired an alert at 02:30 when the merge returned zero rows — three and a half hours before the Power BI refresh ran, and six and a half hours before the escalation.

Schema changes that break without an error

The Databricks Power BI connector handles schema changes gracefully in some respects — Power BI will pick up new columns in the next import cycle. What it does not do is alert you when a column disappears.

If an upstream source system renames a column, the Databricks Delta table can be updated to reflect the new schema (if the job uses schema evolution), leaving Power BI with a table that is missing a column it was previously reading. Power BI's Import mode will load the available columns without error. Measures and visuals built on the missing column evaluate to blank or zero.

The Power BI connector does not detect this. The Databricks job does not surface it as a failure if the table schema evolved cleanly. The detection gap requires a schema comparison between the Delta table state before and after each job run — a check that neither the Databricks connector nor Power BI's refresh monitoring provides.

For teams who want this check without building it themselves, MetricSign monitors the Databricks integration and runs a column comparison on each job run, surfacing schema changes as a separate signal alongside volume and freshness anomalies.

What to add on top of the connector

The Databricks Power BI connector is the right foundation for connecting the two tools. The monitoring layer that belongs above it:

Job completion and duration tracking — Know when each Databricks job that writes a Delta table completed, whether it ran within its normal duration window, and whether it completed before the Power BI refresh schedule.

Row count comparison per run — After each Databricks write, compare the row count (or rows written in the last merge/insert) against the day-of-week baseline for that table. Alert when the deviation exceeds a configured threshold.

Delta table watermark check — Read the MAX of the primary timestamp column in each Delta table that Power BI imports from. Compare it against the Power BI refresh schedule. If the Delta table watermark is stale relative to the import time, the import is loading outdated data.

Schema comparison between runs — Compare the column set of each Delta table after each Databricks job run. Alert when any previously-present column is absent.

Power BI refresh completion correlation — After the Power BI import runs, confirm that the dataset row count is consistent with what the Databricks job wrote. A significant discrepancy indicates a connection or filter issue at the import layer.

These five checks close the monitoring gap that the connector itself does not address. Teams running this combination as part of a broader pipeline that includes Snowflake or dbt should read dbt Snowflake monitoring for how these signals extend across the full chain.

Frequently asked questions

What is the Databricks Power BI connector?+
The Databricks Power BI connector connects Power BI to a Databricks workspace, enabling Import or DirectQuery mode access to Delta tables. It handles authentication, data transfer, and query execution — but provides no monitoring of the Databricks jobs that write to those tables.
Does the Databricks Power BI connector detect job failures?+
No. The connector reads from Delta tables — it does not monitor the Databricks jobs that write them. If a Databricks job fails, writes zero rows, or produces incorrect data, the connector will import whatever is in the Delta table without any indication that something went wrong upstream.
What is the difference between Import and DirectQuery mode for Databricks in Power BI?+
Import mode copies data from Databricks to Power BI's in-memory model on a schedule. DirectQuery mode sends live queries to Databricks on each visual interaction. Import mode produces stale data when the underlying Databricks job fails; DirectQuery mode is affected by Databricks cluster performance and latency.
How do I know if my Databricks Power BI import loaded stale data?+
Compare the maximum timestamp in the Power BI dataset after import against the expected freshness for that table. If the max timestamp shows the previous run rather than the current run, the Databricks job either ran late, wrote partial data, or the connector imported before the job completed.
What monitoring should I add on top of the Databricks Power BI connector?+
At minimum: a job completion check (did the Databricks job that writes the table run before the Power BI refresh?), a row count comparison against baseline, and a Delta table watermark check. These three signals catch the most common failure modes that the connector itself does not surface.

Related integrations

Related articles