What Snowflake monitors and what it doesn't
Snowflake is a cloud data warehouse. Its native observability covers query execution: history, performance, compute credits consumed, warehouse utilization, and task run status. These are the right signals for operational database management — they tell you whether queries are running efficiently and whether scheduled tasks completed.
They do not tell you: - Whether the data loaded into a table is fresh relative to the source system - Whether today's batch load brought fewer rows than expected - Whether a column that was present in last week's schema is missing today - Whether the output that downstream consumers are reading is correct
This is the monitoring gap in Snowflake. The warehouse is doing its job — executing queries, returning results, running tasks. The data quality problems are upstream: the source wrote partial data, the ETL pipeline applied a wrong filter, the schema changed without notice. Snowflake faithfully stores and serves whatever arrived.
Effective data quality monitoring for Snowflake means adding checks at the output layer — what did the table actually receive — rather than at the execution layer. A data observability platform that connects to Snowflake checks freshness, volume, and schema on a schedule, independent of whether queries ran successfully.
The three failure modes Snowflake won't surface
Stale data from a slow source. A Fivetran sync or ADF pipeline copies Salesforce data to Snowflake at 02:00 every night. The Salesforce API rate-limiting causes the sync to time out after processing records through the previous afternoon. The pipeline marks the sync as completed. Snowflake has data — just data from 14 hours ago, not 2 hours ago. The CURRENT_TIMESTAMP() of the Snowflake query is accurate; the MAX(created_at) of the loaded records is not.
Zero-row or partial-row loads. A Snowflake task executes an INSERT INTO ... SELECT statement. If the SELECT returns zero rows (because a date filter matched nothing, a source table was truncated, or a JOIN produced no output), the INSERT completes successfully with zero rows written. Snowflake executed the task. The task history shows SUCCEEDED. The table is unchanged, and whatever Power BI or downstream BI consumers are reading is as stale as the last non-empty load.
Schema drift in the source. An upstream system renames a column. The ETL pipeline uses a wildcard (SELECT *) or does not enforce column-level validation. The renamed column is absent from the next load. Depending on how Snowflake's COPY command and the table definition interact, this results in: a NULL-filled column (if COPY uses FILL_DEFAULT), a missing column in a variant payload (if data is ingested as JSON), or a load error (if schema enforcement is strict). Only the last case generates a visible error. The first two succeed silently.
Watermark monitoring: the simplest high-value check
The watermark check is the fastest way to add meaningful data quality monitoring to Snowflake tables that have a timestamp column.
The query is straightforward:
SELECT MAX(updated_at) AS watermark,
DATEDIFF('hour', MAX(updated_at), CURRENT_TIMESTAMP()) AS hours_since_last_update
FROM analytics.fact_orders;Run this after each expected pipeline completion. Compare the result against a threshold: for a table that should be updated daily by 06:00, a watermark older than 30 hours at 07:00 is an anomaly worth alerting on.
The check costs one Snowflake query per table per monitoring interval. On a compressed Snowflake table with a good partition key, this is seconds. The value is catching the 'pipeline ran but source was stale' failure before stakeholders open their reports.
For tables without a reliable timestamp column — append-only or event tables where CURRENT_TIMESTAMP() is injected at load time — use the Snowflake INFORMATION_SCHEMA.TABLE_STORAGE_METRICS view to check LAST_ALTERED, which reflects the last DML operation on the table. This is a lighter-weight proxy, though it cannot distinguish between a meaningful data update and a metadata-only change.
Teams running dbt Snowflake monitoring often implement watermark checks as dbt source freshness configurations — a clean pattern for transformation-layer freshness that runs as part of the dbt build process.
Volume baselines: detecting the partial load that passes every check
A table can have a fresh watermark and still be missing 40% of its expected rows. This happens when a pipeline loads a subset of records correctly — the most recent timestamp is current, but the batch was truncated or the source query returned fewer records than normal.
Volume monitoring addresses this with a baseline comparison:
- Build a rolling baseline of expected row counts per table per day of week (4–6 weeks of history gives a stable baseline for most tables)
- After each load, compare the current count against the baseline for the same day of week
- Alert when the deviation exceeds a configured threshold — typically 15–25% for stable tables
The day-of-week calibration matters. A Snowflake fact table might receive 100,000 rows on a Tuesday and 60,000 rows on a Sunday — both correct. A flat threshold of '20% below Monday's baseline' would fire false positives every Sunday. The right comparison is Monday vs Monday, Sunday vs Sunday.
For tables loaded by dbt Cloud or ADF into Snowflake, MetricSign tracks these baselines automatically after each pipeline run, without requiring manual threshold configuration per table.
Schema monitoring: catching the renamed column before Power BI shows blanks
Schema drift in Snowflake tables is a pipeline problem that typically originates upstream: a source system renames a field, the extraction layer does not validate the schema, and the new column structure arrives in Snowflake without warning.
Snowflake does not detect or alert on schema changes in loaded data by default. The SHOW COLUMNS command on a table gives you the current schema, but comparing it against the previous schema requires external tooling or a scheduled SQL check.
A schema monitoring check:
-- Run this after each load and compare to previous results
SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'ANALYTICS'
AND TABLE_NAME = 'FACT_ORDERS'
ORDER BY ORDINAL_POSITION;Saving the output after each run and comparing it to the previous snapshot catches column additions, removals, and type changes. Any difference is worth alerting on, because schema changes in a Snowflake table consumed by Power BI or Tableau will break visuals downstream.
This connects directly to the broader picture of data quality monitoring versus observability. Schema monitoring is an observability check — you are not asserting what the schema should be (that is a quality rule), you are watching whether it changed (that is an observability signal). For a deeper look at how the two categories interact in practice, see data quality monitoring tools.