High severityresource
Power BI Refresh Error:
MODEL_SERVING_RATE_LIMIT_EXCEEDED
What does this error mean?
Too many requests were sent to a Databricks Model Serving endpoint in a given time window, exceeding either the endpoint's provisioned concurrency or the workspace-level parallel request quota.
Common causes
- 1A batch inference job sends too many concurrent requests without throttling logic
- 2A traffic spike from multiple upstream jobs all calling the same endpoint at schedule time
- 3Provisioned concurrency is set too low relative to actual peak traffic
- 4The workspace has hit its maximum parallel request limit and requires a quota increase
How to fix it
- 1Step 1: Check the serving endpoint's Traffic and Metrics tab in the Databricks UI to confirm the concurrency spike timing.
- 2Step 2: Enable autoscaling on the endpoint to let it scale up provisioned concurrency automatically during spikes.
- 3Step 3: Add retry logic with exponential backoff in the calling application to handle transient 429 responses gracefully.
- 4Step 4: If the workspace parallel request limit is hit (not just endpoint concurrency), contact Databricks support to increase the quota.
- 5Step 5: For batch scoring, switch to a Databricks job with parallel tasks instead of calling the REST endpoint serially.