What happens when nobody watches the capacity
It's Wednesday, 09:15. A product manager opens a Power BI report backed by a Direct Lake semantic model. The report spins for 20 seconds before returning results. She refreshes. Same delay. She messages the data team: "Power BI is slow today."
The data engineer opens the Fabric Capacity Metrics App. The utilization chart shows a spike that started at 06:12 — a Spark notebook that ran concurrently with three scheduled semantic model refreshes. Together they consumed more CUs than the F4 capacity could handle in a 10-minute window. Fabric applied the first throttling stage: 20-second delays on all interactive operations.
Nobody got an alert. The Capacity Metrics App doesn't send notifications. The spike was already four hours old by the time someone checked. This is the gap between having capacity metrics available and actually monitoring them.
What is Microsoft Fabric capacity?
Microsoft Fabric capacity is a pool of compute resources allocated to your tenant, measured in Capacity Units (CUs). Every Fabric workload — Spark notebooks, data pipelines, semantic model refreshes, warehouse queries, dataflows — draws from this shared pool. You purchase a specific SKU (F2, F4, F8, F16, and so on), and that SKU determines how many CUs per second your capacity provides.
An F2 gives you 2 CUs per second. An F64 gives you 64. The CUs are shared across all workspaces assigned to that capacity. There is no per-workload isolation unless you assign workloads to separate capacities.
This shared model means that a runaway Spark job can consume CUs that your semantic model refreshes need. A warehouse query running during peak hours competes with your dataflow. The capacity doesn't distinguish between "important" and "experimental" — it allocates CUs on a first-come basis and applies throttling when the total exceeds what the SKU supports.
How Fabric Capacity Units (CUs) are consumed and smoothed
Every operation in Fabric consumes CU-seconds. A semantic model refresh that uses 1 CU for 30 seconds consumes 30 CU-seconds. A Spark notebook that runs for 10 minutes using 4 CUs consumes 2,400 CU-seconds.
Fabric doesn't charge all of that consumption to the instant it happened. It uses smoothing to spread the cost over future timepoints. A timepoint in Fabric is 30 seconds long — there are 2,880 timepoints in 24 hours.
The smoothing window depends on the operation type:
- Interactive operations (report queries, on-demand refreshes, API calls): smoothed over 5 to 64 minutes, depending on how many CUs they consumed.
- Background operations (scheduled refreshes, Spark jobs, dataflows): smoothed over 24 hours.
This is where capacity planning gets unintuitive. A single background job consuming 1 CU-hour (3,600 CU-seconds) only contributes about 2.1% to any individual timepoint on an F2 capacity. That same job on an F64 contributes 0.065% per timepoint. The larger your SKU, the less impact each operation has.
But smoothing is not forgiveness. Those CUs are still owed. They accumulate as "carryforward" and must be paid off by future idle capacity. If you keep running heavy operations without breaks, the carryforward grows until throttling kicks in.
The practical consequence: a 10-second Spark burst doesn't cause immediate throttling, but it does reserve future capacity. Stack enough bursts in sequence, and you fill the 10-minute overage window without any single operation looking excessive.
The Fabric Capacity Metrics App — what it shows and what it doesn't
The Capacity Metrics App is a Power BI app that Microsoft provides for capacity admins. It connects to your Fabric tenant and displays compute utilization, storage, throttling events, and per-operation breakdowns.
Key pages:
Health page — high-level overview across all capacities you administer. Shows which capacities are consuming the most compute or experiencing throttling.
Compute page — 14 days of utilization data. Ribbon charts break down CU consumption by workload type (Spark, Warehouse, Semantic Model, Dataflows, AI Functions). The utilization trend chart shows when your capacity exceeded 100%. The throttling chart shows which throttling stage was active and for how long.
Timepoint page — drill into any 30-second window to see exactly which operations consumed how many CUs. This is where you diagnose specific spikes.
Storage page — 30 days of storage consumption by workspace.
What the app does not do:
- No alerting. Microsoft's own documentation confirms: "The Microsoft Fabric Capacity Metrics app doesn't support alerts or notifications." You must open the app and look. If you don't look, you don't know.
- 14-day compute retention. Anything older than 14 days is gone. You cannot analyze monthly trends or compare this week to the same week last month.
- No cross-tool context. The app shows CU consumption per operation but does not correlate it with ADF pipeline runs, dbt jobs, or downstream Power BI report performance. A spike at 06:12 is just a number — you have to manually trace which pipeline triggered it.
- Data latency of 10-15 minutes. Usage data becomes visible approximately 15 minutes after the activity. A throttling event at 06:12 shows up around 06:27 at best.
For a capacity admin who checks the app daily, these limitations are manageable. For a team that needs to respond to throttling before users notice, they are insufficient.
How Fabric throttling works — the three stages
Fabric throttling is progressive. It doesn't immediately reject operations when utilization hits 100%. Instead, it provides a 10-minute overage protection window: your capacity can consume up to 10 minutes of future CUs without any throttling.
Once that window is exhausted, throttling starts in stages:
Stage 1 — Interactive Delay (10 min < overage ≤ 60 min) All new interactive operations (report queries, on-demand refreshes, API requests) receive a 20-second delay before execution. Background operations continue unaffected. Users experience this as "Power BI is slow today" without a clear error message.
Stage 2 — Interactive Rejection (60 min < overage ≤ 24 hours)
New interactive operations are rejected outright. Users see error code CapacityLimitExceeded and the message "Your organization's Fabric compute capacity has exceeded its limits. Try again later." Background operations can still start.
Stage 3 — Background Rejection (overage > 24 hours) All new requests — interactive and background — are rejected. Scheduled refreshes fail. Spark jobs won't start. The capacity is effectively frozen until idle time burns down the accumulated overage.
Operations already in flight are never throttled. A Spark notebook that started before throttling continues to completion. This prevents data loss but also means that long-running jobs continue consuming CUs, potentially extending the throttling period.
The triggers to watch: your capacity's consumed CUs relative to the 10-minute, 60-minute, and 24-hour windows. The Capacity Metrics App shows these as percentage lines on the throttling chart. When the 10-minute line approaches 100%, stage 1 is imminent.
Common triggers that fill the overage window
Three patterns account for most unplanned throttling events on mid-size Fabric capacities (F4 through F64):
Concurrent scheduled refreshes at the same time window. Many teams schedule semantic model refreshes at 06:00 because that's when source data lands. Four semantic models refreshing simultaneously on an F4 can easily exceed the 10-minute buffer. The fix: stagger refreshes by 5-10 minutes, or move large models to background scheduling.
Spark notebooks without CU budgets. A data scientist running an exploratory notebook during business hours can consume hundreds of CU-seconds in bursts. Because interactive Spark operations are smoothed over only 5-64 minutes, the impact hits the capacity fast. The fix: assign experimental workloads to a separate low-priority capacity, or schedule heavy notebooks as background jobs.
Warehouse queries during report peak hours. If your data warehouse runs maintenance queries (table optimization, statistics recalculation) during the same hours users are querying reports, both compete for the same CU pool. Warehouse operations are classified as background and smoothed over 24 hours, which helps — but a large enough query still contributes meaningfully to the 10-minute window.
The pattern across all three: multiple workload types competing for a shared pool without coordination. The Capacity Metrics App can show you this after the fact. Proactive monitoring means knowing it's happening now.
Metrics to track daily for proactive capacity monitoring
If you check nothing else, track these four signals:
1. Peak CU utilization percentage (10-minute window) This is the metric that determines whether stage 1 throttling triggers. On the Capacity Metrics App's compute page, the utilization chart shows the 10-minute overage as a percentage. When this consistently exceeds 80%, you have limited headroom for unexpected spikes.
2. Throttle events per day The System Events table on the compute page logs when throttling stages were activated. One throttle event per week on an F4 might be acceptable. Daily throttle events signal that your capacity is undersized for your workload pattern, or that scheduling needs adjustment.
3. Minutes to burndown When you drill through to the throttling chart, Fabric shows an estimated "minutes to burndown" — how long the capacity would take to return to a non-throttled state if no new operations run. If this number exceeds 60 minutes, you're accumulating carryforward faster than you're paying it off.
4. Top CU consumers by item The matrix by item and operation on the compute page shows cumulative CU consumption over 14 days. Identify which items consistently dominate. A single semantic model consuming 40% of your capacity's CUs is a candidate for optimization or capacity separation.
The challenge: you need to check the Capacity Metrics App to see any of this. There is no push notification when your 10-minute utilization crosses 80%. There is no email when a throttle event occurs. You either build a polling mechanism yourself, or you accept that throttling will be discovered by users before the data team.
What proactive monitoring looks like with MetricSign
MetricSign connects to your Fabric capacity via the same admin APIs that the Capacity Metrics App uses. The difference: MetricSign checks continuously and routes alerts before the overage window fills.
A concrete workflow: your F8 capacity hits 75% of its 10-minute CU budget at 06:14 because two scheduled refreshes and a Spark notebook started within 90 seconds of each other. MetricSign detects the acceleration, calculates the projected overage based on the current consumption rate, and sends a Telegram alert to the data engineering channel at 06:15 — before the 10-minute window is full, before any interactive delays begin.
The alert includes: which operations are consuming the most CUs right now, which downstream reports would be affected if throttling hits, and a link to the specific timepoint in the Capacity Metrics App for deeper investigation.
This is not a replacement for the Capacity Metrics App. The app remains the best tool for historical analysis and capacity planning. MetricSign adds what the app lacks: real-time alerting with cross-stack context. When a Spark notebook causes throttling that delays a Power BI report refresh, MetricSign connects both events in one incident — rather than showing them as unrelated entries in separate tools.
Limitation worth noting: MetricSign monitors capacity metrics at the same 10-15 minute data latency that the Capacity Metrics App has. It cannot alert faster than Microsoft makes the data available. For near-real-time awareness, the Admin API polling interval is the bottleneck, not MetricSign.
MetricSign is free to start, connects to your Fabric capacity in under 10 minutes, and begins monitoring without changes to your existing workloads or schedules.
