MetricSign
Start free
Data Lineage12 min·

Data Lineage Tools: A Practical Guide for Microsoft Stack Teams

Power BI says 'refresh succeeded.' The report shows blank data. Somewhere between your ADF pipeline and the Fabric lakehouse, a column was renamed. You have no way to trace which of your 32 datasets depend on that column.

Lees dit artikel in het Nederlands →

A renamed column and 14 broken reports

Last Tuesday, the CFO opened the Power BI report for the monthly close review. 'Sales by Region' showed a blank table. The data engineer checked Power BI Service: dataset refresh successful, no errors logged. Checked Azure Data Factory: pipeline ran, 147,000 rows copied. Checked Microsoft Fabric: data landed correctly.

Three tools. Three green statuses. One blank report.

The actual problem: a source analyst had renamed a column in the SQL Server staging table three days earlier. The ADF copy activity still ran. It wrote null to the column that fed 14 downstream reports. Without a lineage tool, the data engineer had no way to trace which reports depended on that column, or which column in the source table had changed.

Data lineage tools map how data moves through your pipeline, from source systems and ingestion jobs through transformation layers to the dashboards and reports your business runs on. When a schema change breaks something downstream, lineage tells you which reports are affected and which upstream job caused it.

This guide covers the four main categories of lineage tools, what each is built for, and how to choose one if you're running a Microsoft stack: Power BI, ADF, Fabric, and dbt.

Table-level vs. column-level lineage: the distinction that matters

Most lineage tools start at the table level. They track which pipeline reads from table A and writes to table B, and which dataset reads from table B to produce report C. For understanding dependency chains, table-level lineage is enough to know that 'if source table X changes, report Y might be affected.'

Column-level lineage goes deeper. It tracks which specific columns flow from source to destination, including through each transformation step. When the CFO's report shows blank data because a region_name column was renamed, table-level lineage tells you that the report depends on the pipeline. Column-level lineage tells you specifically that region_name fed the visual that broke, and that same column appears in six other reports.

The distinction matters most during schema drift: the gradual, often unannounced changes to source table structures that come from SaaS API updates, operational database schema changes, or upstream team decisions that don't propagate as breaking errors. Table-level lineage detects that a dependency exists. Column-level lineage detects that the specific field your transformation depends on has changed.

For Microsoft stack teams running dbt transformations on top of Fabric lakehouses, column-level lineage also helps with impact analysis before you ship: rename a dbt model column, and you can see exactly which Power BI measures and calculated columns reference it.

What Power BI's native lineage view can and cannot do

Power BI Service has included a lineage view since 2021. In the workspace view, you can see a graph showing data sources, dataflows, datasets, and reports in a dependency chain. For a single workspace with straightforward data sources, this covers the basics: report A depends on dataset B, dataset B connects to an Azure SQL database.

The limits appear as soon as your stack crosses tool boundaries.

Power BI's lineage view does not show upstream pipeline history. If the dataset connects to a Fabric lakehouse populated by an ADF pipeline, the lineage view shows 'Lakehouse' as the source, not the ADF pipeline run that filled it or the dbt model that shaped the data before it arrived.

It does not show column-level dependencies. You know report A uses dataset B. You do not know which specific columns from dataset B power which visuals in report A.

Cross-workspace dependencies are also invisible. In organizations running development, test, and production workspaces (or when different teams own datasets in separate workspaces), the native lineage view shows only the workspace you're currently in.

For teams with a simple stack (a few datasets directly connected to Azure SQL or SharePoint), the native view covers the basics. For teams whose Power BI reports sit at the end of a chain involving ADF, dbt, and Fabric, the native view shows the last meter of a much longer path.

End-to-end lineage in the Microsoft stack: where the gaps are

A typical Microsoft data stack connects Azure Data Factory or Fabric Data Pipelines to source systems, moves data into a Fabric lakehouse or Synapse warehouse, transforms it with dbt, and surfaces it through Power BI datasets and reports. Four layers. Four distinct APIs. Four independent logging systems.

The gaps emerge at the seams. ADF logs which tables it copied and how many rows, not which columns, and not which downstream dbt models depend on that output. dbt Cloud tracks which models ran and which tests failed, but its lineage is internal to the dbt project: it is not connected to the ADF pipeline upstream or the Power BI dataset downstream. Power BI Service tracks dataset refreshes and report views, but not what changed in the Fabric lakehouse that the dataset reads from.

Microsoft Purview closes some of these gaps. It scans ADF, Power BI, Fabric, and Azure SQL sources to build a unified catalog with lineage metadata. For governance purposes (data ownership, classification, and compliance), Purview is the native Microsoft answer.

Where Purview has operational limits: its scan-based architecture means lineage metadata updates on a scheduled scan cycle, not in real time. If an ADF pipeline fails at 03:00 and the next Purview scan runs at 06:00, the lineage graph reflects the state from the previous scan. For change impact alerting before the 07:00 Power BI refresh window, Purview's architecture is better suited to governance queries than to real-time incident detection.

See also: What is a data observability platform?

Four categories of data lineage tools

The lineage tool market splits into four categories that serve different use cases. Picking the right one depends more on what you need lineage for than on feature comparison scores.

Microsoft Purview: governance lineage

Purview is the right choice when your driver is governance, compliance, or catalog management rather than operational monitoring. It integrates natively with Azure services, supports data classification and sensitivity labels, and connects to Microsoft Information Protection. For teams that need to prove to auditors which systems contain PII, Purview is built for that problem. Its lineage capabilities work well as a reference layer (a map of what depends on what), queried by data stewards and compliance teams, not by on-call engineers responding to a 03:00 pipeline failure.

Collibra and Alation: enterprise data governance platforms

Both platforms build lineage as part of a broader data catalog and governance framework. They are designed for large organizations with dedicated governance teams, compliance requirements, and the engineering bandwidth to implement a full catalog. Lineage is one feature among many: data glossary, quality rules, stewardship workflows, and policy management. If your organization runs data governance at scale, with budget and team capacity allocated specifically to it, either platform can serve. If your team of 4–8 data engineers needs operational visibility into why a report broke this morning, the governance-first scope and enterprise price point are more than you need.

OpenLineage and DataHub: open source options

OpenLineage is a standard maintained by the Linux Foundation for emitting lineage events from data pipelines. DataHub is an open source data catalog that can consume those events. dbt, Airflow, and Spark support OpenLineage emitters natively. The main requirement is instrumentation: every pipeline that should emit lineage events needs an OpenLineage client configured. For teams running Airflow with dedicated platform engineering resources, this approach gives full control over the lineage graph at no license cost. For teams running primarily managed Microsoft services (ADF, Fabric Data Pipelines, Power BI) where you cannot instrument the tool's internals, the open source path requires custom connectors that most teams do not want to build and maintain.

MetricSign: monitoring-first lineage for the Microsoft stack

MetricSign builds lineage from activity logs rather than schema scans or code instrumentation. It reads from Power BI's activity log, ADF's pipeline run API, Fabric's job history, dbt Cloud's run API, and Databricks' job logs, and constructs a real-time lineage graph from those operational events. The primary use case is operational: when an ADF pipeline fails, MetricSign identifies which Power BI datasets depend on it and sends an alert before the next refresh runs. More detail in the section below.

ToolBest forMicrosoft nativeColumn-levelReal-timePricing
Microsoft PurviewGovernance and complianceYesPartialNo (scan cycle)Azure consumption
Collibra / AlationEnterprise data catalogNoYesNoEnterprise (contact)
OpenLineage / DataHubPlatform teams with AirflowNoYes (with config)YesOpen source
MetricSignOperational monitoring, Microsoft stackYes (via APIs)PartialYesFree tier + usage

How to choose: three scenarios

Three paths cover most teams evaluating lineage tooling for a Microsoft stack.

Scenario 1: You need to know why a report broke and what else is affected

This is an operational use case. The right tool connects to your Power BI activity log, your ADF pipeline run history, and your dbt model runs, and correlates them into a single incident view. Purview can answer this question in principle, but requires a completed scan cycle to reflect current state. A monitoring-first tool answers it within minutes of the pipeline failure.

Scenario 2: Governance and compliance are driving the requirement

If your organization needs to classify data assets, track lineage for regulatory purposes, assign data stewardship, or build a business glossary, Purview is the first option to evaluate, especially if you're invested in the Azure ecosystem. For larger organizations with dedicated governance teams, Collibra or Alation add catalog depth and stewardship workflows that Purview does not include.

Scenario 3: Your team has platform engineering resources and wants maximum control

If you're running Airflow, have a data platform team that writes infrastructure code, and want a lineage graph you fully own, OpenLineage with DataHub is a viable path. Budget the instrumentation work at several weeks per pipeline type you want to emit lineage from, plus ongoing maintenance as pipeline frameworks version.

One honest observation: most data teams running a Microsoft stack (4 to 10 engineers, Power BI as the consumption layer, no dedicated platform engineering function) are served better by Scenario 1. A governance platform like Collibra is designed for an organizational context where catalog management is a full-time function.

MetricSign: lineage built from your stack's own activity logs

MetricSign builds a live lineage graph by reading the APIs your existing Microsoft tools already expose. It connects to Power BI Service to read dataset and report dependencies, to Azure Data Factory for pipeline run history and output tables, to Fabric for job history, to dbt Cloud for model run results, and to Databricks for job logs.

From those feeds, it constructs a graph: ADF pipeline X writes to Fabric lakehouse table Y, dbt model Z reads from table Y and writes to semantic model A, Power BI report B reads from semantic model A. When ADF pipeline X fails at 03:47, MetricSign traverses the graph, identifies semantic model A and Power BI report B as downstream dependents, and sends an alert before the 07:00 Power BI refresh starts.

This differs from Purview's scan-based lineage in one specific way: MetricSign operates on pipeline run events, not metadata snapshots. The lineage graph updates within minutes of a pipeline completing. For impact analysis before a schema change, a query on 'which reports depend on this dbt model column' returns a current answer, not the state from the last scan cycle.

MetricSign is free to start. It connects to Power BI and ADF in under 15 minutes without changes to your existing pipelines. The use case it's built for: something breaks in your Microsoft stack at 03:00, and you need to know which reports are affected before the business day starts at 07:00.

Frequently asked questions

What is a data lineage tool?+
A data lineage tool maps how data moves through your pipeline, from source systems through transformation layers to the dashboards and reports your business relies on. It shows which upstream jobs feed which downstream datasets, and which reports break when something changes. More advanced tools also show column-level lineage, tracking exactly which fields flow through each transformation step.
What is the difference between table-level and column-level lineage?+
Table-level lineage tracks which pipeline reads from table A and writes to table B, and which report depends on table B. Column-level lineage tracks exactly which columns flow from source to destination, including through dbt model transformations. Column-level lineage is necessary for schema drift impact analysis: knowing which specific column your report depends on, so you can detect when a source column is renamed or dropped before the next Power BI refresh runs.
Does Microsoft Purview provide enough lineage for Power BI and Fabric?+
Purview provides lineage coverage for Power BI, ADF, Fabric, and Azure SQL through its scan-based architecture. It is well suited for governance use cases: understanding data ownership, classifying sensitive data, and tracking lineage for compliance. For operational alerting in real time, Purview's scan cycle introduces latency that makes it less effective than a monitoring-first tool.
Can I get data lineage without instrumenting my pipelines?+
For managed Microsoft services like ADF, Fabric Data Pipelines, and Power BI, yes. Tools that read from native activity logs and run APIs can construct lineage without requiring you to modify your pipelines. Open source options like OpenLineage typically require emitter configuration on each pipeline type you want to track.
What is the difference between data lineage and data observability?+
Data lineage maps the dependency structure of your data: what depends on what. Data observability monitors the health of that structure in real time: is data arriving on schedule, are volumes normal, did a pipeline fail. The two are complementary: lineage tells you the blast radius of a failure, and observability tells you when a failure has occurred. See: [What is a data observability platform?](/blog/what-is-a-data-observability-platform).

Related integrations

Related articles