MetricSign
Start free
Data Lineage8 min·

End-to-End Data Lineage: From ADF to Power BI

Without a map of your data chain, every investigation starts from scratch.

Lees dit artikel in het Nederlands →

Without lineage, every investigation starts from scratch

Your Power BI dashboard is showing wrong numbers. Where do you start? Without lineage metadata, the answer is: anywhere.

Data lineage is one of the core capabilities that separates mature data observability platforms from single-tool monitoring — it's the connective tissue that links a root cause in ADF to a symptom in Power BI.

You check the dataset refresh history in Power BI Service. The refresh completed successfully at 06:04. You look at ADF pipeline runs. The pipeline succeeded. You query the staging database directly. The data looks present. You check the dbt Cloud job that runs before ADF. It succeeded with two warnings. You read the warnings. One of them is about a model that computed wrong because a source table had unexpected nulls.

Fifty minutes later you've found the problem. Now you need to figure out which other datasets depend on the same dbt model. That's another thirty minutes of checking datasource configurations manually across Power BI workspaces.

The investigation wasn't hard. It was just long, because you had to traverse a chain that exists in your infrastructure but isn't documented anywhere you can query. Lineage makes that chain explicit.

Zie ook: Best data observability platforms in 2026

The full chain: five layers between a source system and a report

A typical enterprise data chain passes through five distinct layers before a row of data reaches a Power BI visual.

Source systems hold the original data: SQL Server databases, SAP exports, REST APIs, Dataverse tables. These systems write data on their own schedule, independently of anything downstream.

Orchestration pipelines (ADF, Fabric Pipelines) move data between layers: from source to staging, from staging to warehouse, from warehouse to semantic model. They determine when data moves and enforce the volume and timing of each load.

Transformation layers (dbt models, Synapse Analytics views, Databricks jobs) reshape raw data into the form that analytical queries need. Failures here often produce technically successful loads with semantically wrong results.

The Power BI semantic model sits on top of the transformed data. It defines the business logic: calculated columns, measures, row-level security. It refreshes from the transformation layer output.

Reports and dashboards read from the semantic model and present data to business users. A failure in any upstream layer eventually shows up here, but tracing it back requires knowing the exact path.

Lineage data is assembled from multiple sources, not queried from one

There is no single API that returns a complete lineage graph for a modern data stack. Lineage is assembled from multiple metadata sources and requires stitching them together.

ADF pipeline run logs tell you which tables a pipeline wrote to and when. The Activity Output property on each Copy Activity contains rows read and rows written — this is where volume validation data lives alongside lineage signals.

dbt manifests (the compiled manifest.json artifact produced by every dbt run) contain the full DAG of model dependencies. Each node records its upstream sources and downstream dependents. Parsing a dbt manifest gives you the transformation layer lineage automatically, without any manual documentation.

Power BI datasource metadata (available via the REST API) tells you which database tables or views each dataset is reading from. Matching datasource connection strings to the destination tables in your pipeline logs and dbt manifest closes the link between the transformation layer and the semantic layer.

The resulting graph isn't perfect. Not every edge is captured with full fidelity — connection string matching is probabilistic when environments use different naming conventions. But a lineage graph with 80% coverage eliminates most of the manual investigation work.

With lineage, the same investigation takes ten minutes

The same scenario — wrong numbers in a Power BI dashboard — handled with lineage metadata available.

You open the incident in your monitoring tool. The alert includes the affected dataset name and a link to its upstream chain. The chain shows: the daily_sales dbt model → the orders_staging ADF pipeline → the orders_source SQL Server table. The dbt model run for this morning shows a warning: the compute_margins model failed due to unexpected nulls in order_line_items.

You check which other Power BI datasets read from daily_sales. Lineage shows three: Sales Overview, Revenue by Region, Monthly Actuals. All three refreshed after the dbt failure and are currently serving data that excludes the affected calculation.

The full picture — root cause, affected datasets, scope of impact — took ten minutes to establish. The fix is separate, but the investigation is done.

Upstream and downstream traversal serve different purposes

Lineage works in both directions, and each direction is useful for a different phase of an incident.

Upstream traversal (root cause analysis) starts from a broken report or dataset and walks back toward the source. When a report shows wrong numbers, upstream traversal answers: which pipeline produced this data? Which dbt model transformed it? Which source system wrote the underlying records? This is the direction you traverse during investigation.

Downstream traversal (impact assessment) starts from a known failure and walks forward toward end users. When you discover that a dbt model failed, downstream traversal answers: which ADF pipelines depend on this model's output? Which Power BI datasets refreshed from those pipelines? Which reports are currently serving the affected data? This is the direction you traverse when deciding who to notify and how urgently.

Most incidents require both. Upstream traversal to find the root cause; downstream traversal to establish scope before you touch anything.

Frequently asked questions

What is data lineage for Power BI?+
Data lineage maps the full path from source systems through pipelines and transformation layers to Power BI reports. It makes the dependency chain queryable, so when something breaks you can identify the root cause and all downstream effects without manually checking each component.
How does data lineage reduce Power BI incident investigation time?+
By making the dependency chain explicit. Without lineage, an engineer traverses each layer manually — ADF logs, dbt runs, database queries, Power BI service — to reconstruct what depends on what. With lineage, that map is already built, and the investigation becomes a lookup rather than a search.
Where does data lineage metadata come from?+
Multiple sources assembled together: ADF pipeline run logs (destination tables, row counts), dbt manifests (model dependency graph), and Power BI REST API datasource metadata (which tables each dataset reads from). Matching these sources builds the full chain.
What is downstream impact analysis in data lineage?+
Downstream traversal starts from a known failure (a failed dbt model, an ADF pipeline error) and follows the dependency chain forward to identify which Power BI datasets and reports are currently affected. It answers the impact scoping question before you start fixing.
How does lineage enable proactive Power BI alerting?+
When a lineage-aware monitoring system detects a pipeline or transformation failure, it can immediately identify all downstream datasets scheduled to refresh and alert before they load stale or incorrect data — turning a reactive incident into a proactive warning.

Related error codes

Related integrations

Related articles