Data Observability7 min·June 3, 2026

Data Quality Monitoring for Snowflake: What Queries Succeeding Doesn't Guarantee

The Snowflake query ran. The ADF pipeline completed. The Power BI dataset refreshed. The numbers are from two days ago. Every tool reported success.

Key Takeaways

✓Snowflake monitors query execution, not data quality — a Succeeded task status does not confirm that the loaded data is fresh, complete, or correctly shaped.
✓Watermark checks (reading MAX(updated_at) against a freshness threshold) are the fastest high-value addition to Snowflake monitoring for tables with timestamp columns.
✓Volume baseline monitoring — comparing row counts against a day-of-week historical baseline — catches partial loads that appear fresh but are missing rows.
✓Schema drift monitoring compares column structure after each load and alerts when columns appear, disappear, or change type — before downstream BI consumers break.
✓Effective Snowflake data quality monitoring runs at the output layer (what did the table receive?), not at the execution layer (did the query run?).

What Snowflake monitors and what it doesn't

Snowflake is a cloud data warehouse. Its native observability covers query execution: history, performance, compute credits consumed, warehouse utilization, and task run status. These are the right signals for operational database management — they tell you whether queries are running efficiently and whether scheduled tasks completed.

They do not tell you: - Whether the data loaded into a table is fresh relative to the source system - Whether today's batch load brought fewer rows than expected - Whether a column that was present in last week's schema is missing today - Whether the output that downstream consumers are reading is correct

This is the monitoring gap in Snowflake. The warehouse is doing its job — executing queries, returning results, running tasks. The data quality problems are upstream: the source wrote partial data, the ETL pipeline applied a wrong filter, the schema changed without notice. Snowflake faithfully stores and serves whatever arrived.

Effective data quality monitoring for Snowflake means adding checks at the output layer — what did the table actually receive — rather than at the execution layer. A data observability platform that connects to Snowflake checks freshness, volume, and schema on a schedule, independent of whether queries ran successfully.

The three failure modes Snowflake won't surface

Stale data from a slow source. A Fivetran sync or ADF pipeline copies Salesforce data to Snowflake at 02:00 every night. The Salesforce API rate-limiting causes the sync to time out after processing records through the previous afternoon. The pipeline marks the sync as completed. Snowflake has data — just data from 14 hours ago, not 2 hours ago. The CURRENT_TIMESTAMP() of the Snowflake query is accurate; the MAX(created_at) of the loaded records is not.

Zero-row or partial-row loads. A Snowflake task executes an INSERT INTO ... SELECT statement. If the SELECT returns zero rows (because a date filter matched nothing, a source table was truncated, or a JOIN produced no output), the INSERT completes successfully with zero rows written. Snowflake executed the task. The task history shows SUCCEEDED. The table is unchanged, and whatever Power BI or downstream BI consumers are reading is as stale as the last non-empty load.

Schema drift in the source. An upstream system renames a column. The ETL pipeline uses a wildcard (SELECT *) or does not enforce column-level validation. The renamed column is absent from the next load. Depending on how Snowflake's COPY command and the table definition interact, this results in: a NULL-filled column (if COPY uses FILL_DEFAULT), a missing column in a variant payload (if data is ingested as JSON), or a load error (if schema enforcement is strict). Only the last case generates a visible error. The first two succeed silently.

Watermark monitoring: the simplest high-value check

The watermark check is the fastest way to add meaningful data quality monitoring to Snowflake tables that have a timestamp column.

The query is straightforward:

SELECT MAX(updated_at) AS watermark,
       DATEDIFF('hour', MAX(updated_at), CURRENT_TIMESTAMP()) AS hours_since_last_update
FROM analytics.fact_orders;

Run this after each expected pipeline completion. Compare the result against a threshold: for a table that should be updated daily by 06:00, a watermark older than 30 hours at 07:00 is an anomaly worth alerting on.

The check costs one Snowflake query per table per monitoring interval. On a compressed Snowflake table with a good partition key, this is seconds. The value is catching the 'pipeline ran but source was stale' failure before stakeholders open their reports.

For tables without a reliable timestamp column — append-only or event tables where CURRENT_TIMESTAMP() is injected at load time — use the Snowflake INFORMATION_SCHEMA.TABLE_STORAGE_METRICS view to check LAST_ALTERED, which reflects the last DML operation on the table. This is a lighter-weight proxy, though it cannot distinguish between a meaningful data update and a metadata-only change.

Teams running dbt Snowflake monitoring often implement watermark checks as dbt source freshness configurations — a clean pattern for transformation-layer freshness that runs as part of the dbt build process.

Volume baselines: detecting the partial load that passes every check

A table can have a fresh watermark and still be missing 40% of its expected rows. This happens when a pipeline loads a subset of records correctly — the most recent timestamp is current, but the batch was truncated or the source query returned fewer records than normal.

Volume monitoring addresses this with a baseline comparison:

Build a rolling baseline of expected row counts per table per day of week (4–6 weeks of history gives a stable baseline for most tables)
After each load, compare the current count against the baseline for the same day of week
Alert when the deviation exceeds a configured threshold — typically 15–25% for stable tables

The day-of-week calibration matters. A Snowflake fact table might receive 100,000 rows on a Tuesday and 60,000 rows on a Sunday — both correct. A flat threshold of '20% below Monday's baseline' would fire false positives every Sunday. The right comparison is Monday vs Monday, Sunday vs Sunday.

For tables loaded by dbt Cloud or ADF into Snowflake, MetricSign tracks these baselines automatically after each pipeline run, without requiring manual threshold configuration per table.

Schema monitoring: catching the renamed column before Power BI shows blanks

Schema drift in Snowflake tables is a pipeline problem that typically originates upstream: a source system renames a field, the extraction layer does not validate the schema, and the new column structure arrives in Snowflake without warning.

Snowflake does not detect or alert on schema changes in loaded data by default. The SHOW COLUMNS command on a table gives you the current schema, but comparing it against the previous schema requires external tooling or a scheduled SQL check.

A schema monitoring check:

-- Run this after each load and compare to previous results
SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'ANALYTICS'
  AND TABLE_NAME = 'FACT_ORDERS'
ORDER BY ORDINAL_POSITION;

Saving the output after each run and comparing it to the previous snapshot catches column additions, removals, and type changes. Any difference is worth alerting on, because schema changes in a Snowflake table consumed by Power BI or Tableau will break visuals downstream.

This connects directly to the broader picture of data quality monitoring versus observability. Schema monitoring is an observability check — you are not asserting what the schema should be (that is a quality rule), you are watching whether it changed (that is an observability signal). For a deeper look at how the two categories interact in practice, see data quality monitoring tools.

Frequently asked questions

What does Snowflake data quality monitoring cover natively?+

Snowflake provides query history, warehouse utilization, task run status, and access logs. These cover operational health. They do not cover data freshness (how old is the data in each table?), volume anomalies (did the expected number of rows arrive?), or schema drift (did a column change or disappear?).

How do you check data freshness in Snowflake?+

Run SELECT MAX(updated_at) FROM your_table and compare the result to the current time. If the gap exceeds your freshness threshold (e.g., more than 30 hours for a daily batch table), the data is stale. For tables without a reliable timestamp, use INFORMATION_SCHEMA.TABLE_STORAGE_METRICS.LAST_ALTERED as a lighter-weight proxy.

Can a Snowflake task succeed but load no data?+

Yes. A Snowflake task executing an INSERT INTO ... SELECT will succeed if the SELECT returns zero rows — the task ran without error, it just wrote nothing. The task history shows SUCCEEDED. Volume monitoring against a baseline is the only way to detect this.

How do you detect schema drift in a Snowflake table?+

Query INFORMATION_SCHEMA.COLUMNS for the table after each load and compare the result to the previous snapshot. Any column present before but absent after — or any data type change — is a schema drift event worth alerting on.

What is the difference between Snowflake task monitoring and data quality monitoring?+

Snowflake task monitoring tells you whether the task ran and completed. Data quality monitoring tells you whether the output meets expectations: correct number of rows, correct schema, fresh timestamp. Task success is a prerequisite; data quality monitoring is what you run after the task succeeds to verify the output.

Related integrations

← All articles Share on LinkedIn