MetricSign
Start free
Blog

Data Pipeline Monitoring

In-depth articles on data observability, lineage, and incident response — written for data engineers who manage Power BI, ADF, Databricks, Fabric, and dbt.

Data Observability20Data Lineage5Cloud Migration2Best Practices17Troubleshooting12

Data Observability

Data Observability8 min

Your Actual-vs-Budget Variance Visual Only Lies When the Refresh Fails Silently

Custom variance visuals like PBIGenie's Hammerhead make actual-vs-budget comparisons readable. They don't make the underlying data trustworthy.

May 25, 2026
Data Observability9 min

AI Agents Generate Queries Your Pipeline Monitoring Was Never Built to Trace

Copilot writes a DAX query that times out your dataset refresh. The error log says timeout. It doesn't say why the query existed in the first place.

May 18, 2026
Data Observability9 min

Databricks Lakebase Adds a New Failure Surface Your Pipeline Monitoring Doesn't Cover

Synced tables, scale-to-zero session drops, and metrics that report zero when data still exists — Lakebase introduces failure modes that don't map to your existing Databricks monitoring.

May 18, 2026
Data Observability9 min

Databricks Job Failures Leave No Breadcrumbs Unless You Build the Trail Yourself

A Databricks job fails at 3am. The cluster terminated. The driver log rolled over. The downstream dbt model ran anyway — on yesterday's data. Here is how to build the audit trail Databricks does not give you by default.

May 18, 2026
Data Observability8 min

Databricks Snapshot Connectors Return Stale Data Without Telling You

Query-based connectors in Databricks rely on Delta Lake snapshots that can silently age out, leaving downstream consumers reading data that looks current but isn't.

May 11, 2026
Data Observability10 min

Power BI Alerts: What Native Alerting Can and Can't Do

You set an alert on your Power BI revenue card. Three weeks later, the pipeline breaks, the card shows yesterday's number, and nobody gets notified.

May 10, 2026
Data Observability11 min

Fabric Capacity Metrics Explained: What to Monitor Before You Get Throttled

Your Fabric capacity hit 100% utilization at 06:12 this morning. The Capacity Metrics App won't show it for another 15 minutes. By then, interactive queries are already delayed.

May 10, 2026
Data Observability11 min

Microsoft Fabric Monitoring: What Native Tools Miss and How to Fill the Gaps

Your Lakehouse copy ran green. Capacity sits at 84%. Direct Lake served the report on time. The numbers are still wrong by €1.4M.

May 9, 2026
Data Observability9 min read

Data Observability Tool: 5 Capabilities That Separate Hype from Help

Vendors call almost anything an observability tool. These are the five capabilities that decide whether one will save your team or just add another dashboard to ignore.

May 7, 2026
Data Observability8 min read

Azure Monitor Alerts: What It Catches, What It Misses, and What to Do Next

Azure Monitor is excellent at one thing: telling you when CPU goes up. The problems that actually wake data teams at night live in the gaps between what it watches and what your business sees.

May 7, 2026
Data Observability9 min read

Data Monitoring System: What It Is, What It Isn't, and How to Build One That Works

Most data monitoring systems are a Slack channel, a few cron jobs, and hope. The teams that ship reliable data are the ones who build the four layers below — in this order.

May 7, 2026
Data Observability8 min read

Data Quality Monitoring Tools: What They Catch, What They Miss, and How to Choose One

A data quality monitoring tool tells you when a column violates a rule you wrote. It is the cheapest, fastest improvement most data teams can make. It is also where most teams stop, and that is where the trouble starts.

May 7, 2026
Data Observability9 min

Best Data Observability Tools and Platforms in 2026 (Compared)

Most comparisons miss the question that matters: does the platform actually cover your stack?

May 6, 2026
Data Observability7 min

Data Observability Platform for the Microsoft Data Stack

Power BI says the refresh succeeded. ADF reports the pipeline ran. Databricks shows all jobs completed. Your users are looking at yesterday's numbers.

May 6, 2026
Data Observability9 min

What Is a Data Observability Platform? (And Why Your Modern Data Stack Needs One)

Your dbt job finished. Your ADF pipeline ran. Your Power BI dashboard shows last week's numbers. Nobody got an alert.

May 5, 2026
Data Observability10 min

Microsoft Fabric SLA Monitoring: Why Your Alerting Architecture Breaks Before Your Pipeline Does

Fabric gives you three layers of pipeline alerting — activity-level, item-level, workspace-level — and none of them natively answers "did the file arrive on time?"

May 4, 2026
Data Observability14 min

Data Observability for the Microsoft Stack: Power BI, ADF, Databricks, dbt, and Fabric

Five failure layers, no single native tool that covers them, and a correlation problem that makes every incident look like three.

May 4, 2026
Data Observability8 min

Power BI Monitoring Beyond Refreshes: What a Data Observability Tool Actually Watches

Your refresh says succeeded. Your users see wrong data. These are the four signals a data observability tool watches that most Power BI monitoring setups miss.

April 11, 2026
Data Observability7 min

Why Silent Data Failures Cost More Than Outages

A failed refresh announces itself. Wrong data loaded silently does not.

April 10, 2026
Data Observability9 min

5 Data Observability Practices for Power BI Teams (Without a Heavy Tool)

A practical checklist for teams that want to catch data issues before their users do — without committing to a full data observability tool on day one.

April 9, 2026

Data Lineage

Data Lineage12 min

Data Lineage Tools: A Practical Guide for Microsoft Stack Teams

Power BI says 'refresh succeeded.' The report shows blank data. Somewhere between your ADF pipeline and the Fabric lakehouse, a column was renamed. You have no way to trace which of your 32 datasets depend on that column.

May 12, 2026
Data Lineage9 min

Column Lineage at Compile Time Changes What You Can Catch Before Production

Most lineage tools show you what happened. Compile-time lineage shows you what will break.

May 4, 2026
Data Lineage8 min

Column Lineage at Compile Time Catches What Post-Hoc Graph Crawls Miss

Rocky, a Rust-based warehouse control plane, computes column-level lineage during compilation rather than after execution. The difference determines whether you find a broken join before or after your stakeholders do.

May 4, 2026
Data Lineage8 min

End-to-End Data Lineage: From ADF to Power BI

Without a map of your data chain, every investigation starts from scratch.

April 8, 2026
Data Lineage7 min

Data Pipelines Need Lineage, Not Just Data Monitoring Software

Data monitoring software tells you what broke. Lineage tells you why — and what it's taking down with it.

April 7, 2026

Cloud Migration

Cloud Migration8 min

Monitoring During Cloud Migration: Why Single-Environment Data Monitoring Software Falls Short

During migration, you're not monitoring one environment — you're monitoring two. Most data monitoring software is built to watch one stack, not two stacks running side by side.

April 6, 2026
Cloud Migration8 min

From SSIS to ADF to Fabric: Keeping Oversight

Three generations of ETL tooling, one data stack — maintaining visibility when the tools keep changing.

April 5, 2026

Best Practices

Best Practices8 min

Databricks R Plots Vanish Without an Error — The Graphics Device Fails Silently

Your R code runs clean. The cell completes. The plot area is blank. Databricks doesn't tell you why — because from the runtime's perspective, nothing went wrong.

May 25, 2026
Best Practices9 min

Your Composite Model Is Slower Than DirectQuery Alone — Here's Why

Mixing DirectQuery with imported SharePoint lists sounds pragmatic. The storage engine disagrees.

May 18, 2026
Best Practices8 min

Power BI Admin Portal: What It Shows, What It Hides, and When You Need More

The dataset refreshed at 06:02. The audit log says succeeded. The board meeting starts at 09:00. The Admin Portal has nothing to tell you about the ADF pipeline that wrote zero rows at 03:44.

May 12, 2026
Best Practices9 min

Databricks Cannot Find Your Iceberg Table in Glue — The Catalog Configuration That Fails Silently

Six Spark properties stand between your Databricks cluster and an Iceberg table registered in AWS Glue. Get one wrong and you'll see TABLE_OR_VIEW_NOT_FOUND — with no hint about which property caused it.

May 11, 2026
Best Practices9 min

Databricks Vendor Access: How to Block Direct Workspace Changes Without Breaking Delivery

Your vendor's consultant just overwrote a production notebook at 4pm on a Friday. Here's how folder permissions, service principals, and Git folders prevent that from happening again.

May 11, 2026
Best Practices8 min

Your Databricks Compute Tab Is Missing Because of Entitlements, Not a Bug

The Compute tab vanishes silently when entitlements are wrong. Three settings control whether your users can see it, and none of them produce an error message.

May 11, 2026
Best Practices8 min

Delta MERGE From Multiple Source Tables Fails Because UNION ALL Isn't Enough

A UNION ALL in the USING clause looks correct until two source tables contribute a row for the same key. Delta rejects the ambiguity outright.

May 4, 2026
Best Practices7 min

PySpark split() Silently Drops Data When Your Delimiter Assumption Is Wrong

The split-and-getItem pattern works perfectly on sample data. Production strings have trailing spaces, embedded delimiters, and missing fields that turn your columns into nulls without warning.

May 4, 2026
Best Practices8 min

Delta MERGE from Multiple Source Tables Fails When You Skip Deduplication

UNION ALL your sources into MERGE and Spark will punish you with an ambiguous match error — unless you deduplicate first.

May 4, 2026
Best Practices14 min

Power BI Monitoring Tools Compared: The 2026 Buyer's Guide

Native notifications miss the failures that actually hurt. Here's how the major Power BI monitoring tools compare on detection, correlation, and time-to-deploy.

May 4, 2026
Best Practices9 min

ADF pipeline failure monitoring: where native alerts stop working

Native Azure Monitor catches pipeline failures. It misses the Copy activity that succeeded with the wrong schema — and that's the one your stakeholders will call about.

May 4, 2026
Best Practices9 min

Spark Performance: Scala vs Python Where It Actually Matters

The runtime gap between PySpark and Scala is not what most benchmarks measure. The real cost lives in serialization boundaries, executor process model, and where your UDFs run.

April 26, 2026
Best Practices9 min

Microsoft Fabric Copy Job: Failure Modes Beginners Hit in Production

The tutorial shows a green checkmark. Production shows a half-loaded Lakehouse table and a stakeholder asking why yesterday's revenue is missing.

April 26, 2026
Best Practices8 min

How to Get Notified When a Power BI Dataset Refresh Fails

Power BI has built-in refresh failure notifications. They're not enough for most production environments.

April 25, 2026
Best Practices8 min

Power BI Scheduled Refresh Fails But Manual Refresh Works: Root Causes and Fixes

If manual refresh works and scheduled refresh fails, the problem is not the data source. It is the environment the scheduled run uses.

April 25, 2026
Best Practices9 min

Power BI On-Premises Gateway Offline: Causes, Diagnostics, and Fixes

A gateway that goes offline at 02:00 and recovers by 09:00 can silently fail dozens of scheduled refreshes while everyone sleeps.

April 25, 2026
Best Practices9 min

Incident Response for Data Pipeline Failures: A Data Pipeline Management Playbook

What do you do when it's 3am and your most important dataset just failed to refresh? A data pipeline management playbook for the moment monitoring fires its first alert.

April 4, 2026

Troubleshooting

Troubleshooting8 min

Your Databricks Reconciliation Job Runs Forever Because It Has No Reason to Stop

Reconciliation workloads compare two large datasets row by row. When that comparison never converges, your cluster burns compute until someone notices — or the budget runs out.

May 11, 2026
Troubleshooting9 min

dbt Production Errors: A Reference Index of Run Failures

Your dbt run finished at 04:12. Three models failed. The error log says 'current transaction is aborted'. Downstream, Power BI already refreshed on yesterday's data.

May 10, 2026
Troubleshooting11 min

Power BI Authentication Errors: A Reference Index of AADSTS Codes

Your scheduled refresh failed with an AADSTS code. The dashboard still shows yesterday's numbers. Here is how to read the code and find the right fix without trawling the full Microsoft reference.

May 9, 2026
Troubleshooting12 min

Azure Data Factory Pipeline Errors: A Reference Index of Common Failures

Your ADF pipeline failed at 03:42 with a UserError code that means nothing on its own. The Power BI refresh that depends on it is two hours away. Here is how to read the error class and jump to the fix.

May 9, 2026
Troubleshooting9 min

VBA + ADODB Queries Against Power BI Lose Rows Without Telling You

Your DAX query returns 11,000 rows in DAX Studio and 6,000 through VBA. No error. No warning. Just missing data your stakeholders will find before you do.

May 4, 2026
Troubleshooting9 min

VBA Queries Against Power BI XMLA Endpoints Silently Drop Rows

Your DAX query returns 11,000 rows in DAX Studio but 6,000 through VBA. The query isn't wrong. The ADODB plumbing is.

May 4, 2026
Troubleshooting9 min

Lakeflow Connect SQL Server: Why the Database Setup Step Keeps Failing

The setup wizard looks simple. Four steps, a few stored procedures, done. But the database setup step fails without telling you which prerequisite it actually checked and rejected.

April 27, 2026
Troubleshooting10 min

AADSTS Errors in Power BI Scheduled Refresh: Causes and Fixes

Your scheduled refresh failed at 06:00. The error message contains an AADSTS code. Here's what each one means.

April 25, 2026
Troubleshooting9 min

Power BI Gateway Errors: DM_GWPipeline Codes Explained

A DM_GWPipeline error means the gateway is part of the problem. Here's how to find out which part.

April 25, 2026
Troubleshooting9 min

ADF Pipeline Permission Errors: Access Denied, 401, and 403 Fixes

The connection test passes. The pipeline run fails with 403. They are not the same thing.

April 25, 2026
Troubleshooting9 min

Databricks Job Failures: OOM, Data Skew, and DRIVER_NOT_RESPONDING

DRIVER_NOT_RESPONDING is a symptom. The cause is almost always memory pressure or GC pause. Here is how to find it and fix it.

April 25, 2026
Troubleshooting9 min

dbt Run Failures in Production: Permission Errors, SQL Failures, and Incremental Drift

The model works locally. The production deployment fails. The difference is almost always permissions, credentials, or SQL dialect.

April 25, 2026

Stop finding out from your users

MetricSign monitors your Power BI datasets, ADF pipelines, Databricks jobs, Fabric Pipelines, and dbt models — and surfaces incidents with root cause context before your stakeholders notice.

Get started free →