Medium severitydata flow
Power BI Refresh Error:
DF-Executor-BroadcastTimeout
What does this error mean?
A Spark broadcast join timed out — the dataset being broadcast to executor nodes took longer than Spark's broadcast timeout to transmit. This indicates the broadcast dataset is too large, the cluster is under load, or network throughput between the IR nodes and storage is limited.
Common causes
- 1The dataset being broadcast is too large to transmit to all executor nodes within Spark's default broadcast timeout (5 minutes for standard IR)
- 2The Azure IR cluster is under heavy load and the broadcast transmission is delayed by competing operations
- 3Network throughput between the integration runtime and the source storage is limited, slowing the broadcast data read
- 4The Broadcast option is set to 'Auto' and Spark incorrectly estimates a large dataset as broadcastable
How to fix it
- 1Open the failing join transformation in ADF Studio and set the Broadcast option to 'Off' to disable the broadcast join.
- 2If broadcast joins are required, add a data flow parameter `azure.sizeAmbiguityRelationThreshold` with a higher byte value to raise the auto-broadcast threshold.
- 3Consider pre-filtering the broadcasted dataset to reduce its size before the join transformation.
- 4Increase the Azure IR core count — a larger cluster transmits the broadcast data faster and is less likely to hit the timeout window.
- 5If the join key is highly selective, switch to Sort Merge Join strategy which avoids broadcasting entirely.
Frequently asked questions
Official documentation: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-troubleshoot-guide