MetricSign
Request Access
Lineage

How do I build data lineage from Azure Data Factory to Power BI?

Building lineage from ADF to Power BI requires connecting three data sources: ADF pipeline metadata (what pipelines write to which tables), Power BI datasource metadata (what tables each dataset reads), and optionally run timestamps to confirm the dependency is active.

Step 1: Inventory ADF pipeline outputs

For each ADF Copy Activity, the JSON definition contains the sink dataset configuration, which includes the destination server, database, schema, and table name. You can extract this by calling the ADF REST API's pipeline list and pipeline get endpoints, then parsing the activity sink configurations.

For ADF Data Flows with more complex transformations, the sink configuration similarly specifies the destination table.

Step 2: Inventory Power BI datasource configurations

The Power BI REST API's getDatasources endpoint returns the data source connection string for each dataset. For SQL Server and Azure SQL sources, this includes the server hostname, database name, and (for table or view connections) the schema and table name.

Not all datasets return table-level detail — some return only server and database. These still provide partial lineage (dataset → database) even without the specific table.

Step 3: Match on server, database, and table

With both inventories, match ADF pipeline destinations to Power BI datasources where server + database + table all align. Exact string matching works for most cases. Watch for variations in server names (FQDN vs. short name, presence of port) and handle them with normalization.

Step 4: Confirm with run timestamps

A structural match tells you the pipeline can feed the dataset — but does it actually do so on an ongoing basis? Confirm by checking: does the ADF pipeline's last successful run timestamp precede the dataset's last successful refresh timestamp? If yes, the dependency is active.

Handling indirect dependencies

Some datasets don't read from the ADF output table directly — they read from a view that joins the ADF output with other tables, or from a dbt model that transforms the ADF output. These indirect dependencies require either manual mapping or integration with the transformation tool (dbt manifest, Databricks job metadata) to trace the full chain.

Maintaining lineage over time

Pipeline output tables and dataset connection strings change as the environment evolves. Lineage maps need to be refreshed regularly — either by re-running the matching process periodically or by detecting changes through change data capture on the relevant metadata.

Related questions

Related integrations

Related articles

← All questions