Medium severitydata flow
Power BI Refresh Error:
DF-Executor-OutOfMemorySparkBroadcastError
What does this error mean?
Spark ran out of memory specifically while attempting to broadcast a dataset for a join. The broadcast dataset is larger than the available executor heap — this is a more specific variant of OutOfMemoryError that occurs during the broadcast phase of a join operation.
Common causes
- 1The dataset on one side of a join is too large to fit in executor memory as a broadcast hash table
- 2Multiple join transformations in the same data flow each use broadcast, compounding memory pressure across all joins
- 3The Azure IR compute type has insufficient memory per executor node for the broadcast dataset size
- 4The Broadcast option is set to 'Auto' and Spark estimates the dataset as small enough to broadcast, but the actual data is larger due to data growth
How to fix it
- 1Open each Join transformation in the data flow and set the Broadcast option to 'Off' to prevent Spark from attempting to broadcast the dataset.
- 2If you want to keep broadcast joins for small lookup tables, set Broadcast to 'Fixed' only on the smaller side of the join, not the large fact table.
- 3Add a data flow parameter `azure.sizeAmbiguityRelationThreshold` with a lower byte value to prevent Spark from auto-selecting broadcast for large datasets.
- 4Increase the Azure IR compute type to give Spark executors more heap memory for broadcast operations.
- 5Pre-aggregate or filter the dataset being broadcast to reduce its size before the join transformation.
Frequently asked questions
Official documentation: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-troubleshoot-guide