MetricSign
EN|NLRequest Access
Medium severitydata flow

Power BI Refresh Error:
DF-Executor-BroadcastTimeout

What does this error mean?

A Spark broadcast join timed out — the dataset being broadcast to executor nodes took longer than Spark's broadcast timeout to transmit. This indicates the broadcast dataset is too large, the cluster is under load, or network throughput between the IR nodes and storage is limited.

Common causes

  • 1The dataset being broadcast is too large to transmit to all executor nodes within Spark's default broadcast timeout (5 minutes for standard IR)
  • 2The Azure IR cluster is under heavy load and the broadcast transmission is delayed by competing operations
  • 3Network throughput between the integration runtime and the source storage is limited, slowing the broadcast data read
  • 4The Broadcast option is set to 'Auto' and Spark incorrectly estimates a large dataset as broadcastable

How to fix it

  1. 1Open the failing join transformation in ADF Studio and set the Broadcast option to 'Off' to disable the broadcast join.
  2. 2If broadcast joins are required, add a data flow parameter `azure.sizeAmbiguityRelationThreshold` with a higher byte value to raise the auto-broadcast threshold.
  3. 3Consider pre-filtering the broadcasted dataset to reduce its size before the join transformation.
  4. 4Increase the Azure IR core count — a larger cluster transmits the broadcast data faster and is less likely to hit the timeout window.
  5. 5If the join key is highly selective, switch to Sort Merge Join strategy which avoids broadcasting entirely.

Frequently asked questions

What is the difference between BroadcastTimeout and BroadcastFailure?

BroadcastFailure is a memory allocation error — the dataset is too large for executor memory. BroadcastTimeout occurs when transmission exceeds Spark's timeout window. Both are resolved by disabling broadcast on the join transformation.

Can increasing the IR size fix a broadcast timeout?

More cores can worsen the timeout by adding executors receiving the broadcast simultaneously. Increasing node size helps if the dataset fits — but disabling broadcast is the most reliable fix.

Is there a way to increase Spark's broadcast timeout threshold?

Yes — add data flow parameter `azure.sizeAmbiguityRelationThreshold` to lower Spark's broadcast auto-select threshold. However, you cannot set the broadcast timeout directly. Disabling broadcast is the more reliable solution.

Will downstream Power BI datasets be affected?

Yes — the pipeline fails and downstream datasets serve stale data.

Official documentation: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-troubleshoot-guide

Other data flow errors