MetricSign
Start free
Cloud Migration8 min·

Monitoring During Cloud Migration: Why Single-Environment Data Monitoring Software Falls Short

During migration, you're not monitoring one environment — you're monitoring two. Most data monitoring software is built to watch one stack, not two stacks running side by side.

Lees dit artikel in het Nederlands →

Monitoring During Cloud Migration: Why Single-Environment Data Monitoring Software Falls Short

During migration, you're monitoring two environments — most tools cover one

Data monitoring software designed for a steady-state environment becomes a liability during cloud migration. The pattern we see: teams pick a tool that fits their target stack (often Azure-native), then realise the legacy on-prem half has gone dark.

Cloud migration projects rarely have a clean cutover date. The reality for most organizations moving from on-premises SQL Server and SSIS to Azure SQL and ADF is a months-long period where both environments are running production workloads simultaneously. Some pipelines have migrated. Others haven't. Some Power BI datasets still refresh from on-premises sources via the data gateway. Others have been redirected to Azure.

The monitoring challenge is structural: most tools are built for one environment or the other. Azure Monitor covers your ADF pipelines and Azure SQL. SQL Server Agent covers your SSIS jobs. Power BI's built-in monitoring covers your datasets but doesn't show you the upstream pipeline status that caused a failure. None of them give you a single view of both environments simultaneously.

This gap produces a predictable failure pattern. An SSIS package fails on-premises. The ADF pipeline that was supposed to replace it hasn't been deployed yet. The Power BI refresh that depends on that data runs, loads stale data, and reports success. You find out when a business user notices the numbers are old. By that point, the on-premises failure is hours old.

The data gateway is your biggest single point of failure during migration

The on-premises data gateway is the bridge between your legacy on-premises systems and Azure. During migration, almost every hybrid Power BI dataset flows through it. When the gateway goes down, every on-premises-connected dataset stops refreshing simultaneously — silently, without error, until the scheduled refresh time arrives and finds no gateway available.

The gateway runs as a Windows service. It has a known set of failure modes: service crashes due to memory pressure, certificate expiration after 90 days (which kills the HTTPS tunnel without warning), network policy changes that block outbound connections to Azure, and Windows updates that require a restart without automatically restarting the service afterward.

Gateway health is not visible in Power BI Service until a refresh actually fails. Proactive monitoring requires checking the gateway health endpoint directly on the machine running the gateway service — a simple HTTP check that returns current status, certificate expiry date, and connection health. Running this check every five minutes gives you advance warning of certificate expiration and catches service crashes before they affect scheduled refreshes.

Clusters of simultaneous failures across all on-premises-connected datasets are almost always a gateway failure, not individual pipeline problems. The lineage pattern is the fastest indicator.

On-Premises legacy stack SSIS ETL pipelines SQL Server on-prem DW Gateway on-prem connector Cloud new stack ADF / Fabric cloud pipelines Synapse / Fabric cloud DW Power BI Dataset dual-source Monitoring Gap neither tool covers the handoff zone during migration: both stacks run simultaneously · visibility drops at the boundary
During migration, both stacks run in parallel. The handoff zone between them is the monitoring blind spot.

Credential surface area is largest exactly when you're most distracted

Credential management is the category most likely to cause an incident during migration — and it's the one most likely to be deferred because it feels like a bookkeeping problem rather than an engineering one.

At the start of a migration project, credentials exist in a small number of places: SSIS package connection managers, SQL Server Agent job steps, and Power BI dataset credentials in the admin portal. Over the course of the migration, the surface expands considerably. New ADF linked services are created for each migrated pipeline. Azure Key Vault references are added. Power BI dataset credentials are updated to point at Azure endpoints. Some old SSIS packages still run from on-premises and reference the original credentials.

The failure mode is predictable: a credential that worked fine in on-premises gets updated in Azure but the on-premises reference isn't cleared. Or a Key Vault secret is rotated as part of a security policy update and the ADF linked service that references it isn't refreshed. Or a Power BI dataset credential for a migrated source expires because nobody updated the renewal schedule for the Azure endpoint.

Migration is the right moment to create a single credential inventory — a list of every connection string, service account, and managed identity that any pipeline or dataset uses, where it's stored, and when it was last validated. The inventory doesn't need to be sophisticated. A spreadsheet with the right columns prevents the class of incident where a credential expires because it wasn't in anyone's maintenance rotation.

Schedule conflicts hide in plain sight

Before migration, the data loading schedule probably lived in one place: SQL Server Agent jobs. The schedule was straightforward to audit.

During migration, schedules multiply. SSIS packages run from SQL Server Agent. ADF pipelines run from ADF triggers. Some Power BI refreshes are scheduled in Power BI Service. Some are triggered programmatically after an ADF pipeline completes. The dependencies between these schedules — SSIS runs first, then ADF picks up its output, then Power BI refreshes — are not documented anywhere central.

Schedule conflicts arise when two components try to process the same data at the same time. An ADF pipeline triggers while an SSIS package is still writing to the same staging table. A Power BI refresh starts while ADF is mid-copy into the source table. The data loaded is partial. The refresh succeeds. The report is wrong.

The most reliable way to surface schedule conflicts is to visualize all schedules together before they cause an incident. A timeline showing SSIS job times, ADF trigger times, and Power BI refresh windows, overlaid, makes conflicts visible that are invisible when each is viewed separately. This audit takes an afternoon and prevents the class of problem that takes a day to diagnose.

Define what healthy looks like at each migration stage before you need to prove it

Migration monitoring needs to track progress, not just failures. At each stage of the migration, defining explicitly what healthy looks like gives you something to verify against — and gives you a defensible baseline when a business stakeholder asks whether the migration is on track.

At the pre-migration baseline stage, healthy means: all on-premises pipelines running on schedule, all Power BI datasets refreshing successfully, all refresh durations within normal range. This baseline is your reference point for comparison throughout the migration.

During parallel running, healthy means: all migrated pipelines producing output that matches the on-premises equivalent within a defined tolerance. Row count parity between on-premises and Azure pipelines for the same time window is the most reliable check. A migrated ADF pipeline that loads 48,000 rows while the SSIS equivalent loads 52,000 rows needs investigation before the SSIS version is retired.

At cutover, healthy means: all ADF pipelines running on schedule, all Power BI datasets refreshing within expected windows, gateway health confirmed for any remaining on-premises connections, and no increase in refresh error rate compared to the parallel running period.

Defining these checkpoints before the migration starts makes it possible to catch regression early rather than during the first post-cutover Monday morning when every stakeholder is watching.

Frequently asked questions

Why is monitoring harder during cloud migration?+
Most monitoring tools cover one environment, not two. During migration, SSIS runs on-premises, ADF runs in Azure, and Power BI datasets may connect to either. No native tool provides a unified view, so failures in the on-premises side often don't surface until a downstream Power BI refresh fails.
Why is the on-premises data gateway a critical monitoring target during migration?+
During migration, every hybrid Power BI dataset flows through the gateway. A gateway failure silently stops all on-premises-connected refreshes simultaneously. Gateway failures don't appear in Power BI Service until a scheduled refresh runs and finds no gateway — proactive health monitoring on the gateway machine itself is the only way to detect issues before they affect datasets.
How should credential management be handled during cloud migration?+
Create a credential inventory before migration begins: every connection string, service account, and managed identity used by any pipeline or dataset, where it's stored, and when it was last validated. Update this inventory as new ADF linked services and Key Vault references are created throughout the project.
What schedule conflicts occur during cloud migration?+
Two categories: temporal conflicts (SSIS and ADF processing the same staging table at the same time) and dependency misalignment (Power BI refreshes scheduled before the upstream ADF pipeline completes). Both are visible when SSIS job times, ADF trigger times, and Power BI refresh windows are visualized together on a single timeline.
How do you define healthy at each migration stage?+
Pre-migration: all on-premises pipelines and Power BI refreshes running on schedule as a documented baseline. During parallel running: row count parity between on-premises and Azure pipelines within defined tolerance. At cutover: ADF pipelines on schedule, refresh error rate no higher than the parallel running period.

Related error codes

Related integrations

Related articles