What causes DF-Executor-OutOfMemorySparkBroadcastError?

The dataset on one side of a join is too large to fit in executor memory as a broadcast hash table. Multiple join transformations in the same data flow each use broadcast, compounding memory pressure across all joins. The Azure IR compute type has insufficient memory per executor node for the broadcast dataset size. The Broadcast option is set to 'Auto' and Spark estimates the dataset as small enough to broadcast, but the actual data is larger due to data growth

How do I fix DF-Executor-OutOfMemorySparkBroadcastError?

Open each Join transformation in the data flow and set the Broadcast option to 'Off' to prevent Spark from attempting to broadcast the dataset.. If you want to keep broadcast joins for small lookup tables, set Broadcast to 'Fixed' only on the smaller side of the join, not the large fact table.. Add a data flow parameter `azure.sizeAmbiguityRelationThreshold` with a lower byte value to prevent Spark from auto-selecting broadcast for large datasets.. Increase the Azure IR compute type to give Spark executors more heap memory for broadcast operations.. Pre-aggregate or filter the dataset being broadcast to reduce its size before the join transformation.

Medium severitydata flowAzure Data Factory →

ADF Pipeline Error:
DF-Executor-OutOfMemorySparkBroadcastError

What does this error mean?

Spark ran out of memory specifically while attempting to broadcast a dataset for a join. The broadcast dataset is larger than the available executor heap — this is a more specific variant of OutOfMemoryError that occurs during the broadcast phase of a join operation.

Common causes

1The dataset on one side of a join is too large to fit in executor memory as a broadcast hash table
2Multiple join transformations in the same data flow each use broadcast, compounding memory pressure across all joins
3The Azure IR compute type has insufficient memory per executor node for the broadcast dataset size
4The Broadcast option is set to 'Auto' and Spark estimates the dataset as small enough to broadcast, but the actual data is larger due to data growth

How to fix it

1Open each Join transformation in the data flow and set the Broadcast option to 'Off' to prevent Spark from attempting to broadcast the dataset.
2If you want to keep broadcast joins for small lookup tables, set Broadcast to 'Fixed' only on the smaller side of the join, not the large fact table.
3Add a data flow parameter `azure.sizeAmbiguityRelationThreshold` with a lower byte value to prevent Spark from auto-selecting broadcast for large datasets.
4Increase the Azure IR compute type to give Spark executors more heap memory for broadcast operations.
5Pre-aggregate or filter the dataset being broadcast to reduce its size before the join transformation.

Frequently asked questions

What is the difference between OutOfMemorySparkBroadcastError and OutOfMemoryError?

OutOfMemorySparkBroadcastError is caused by a failed broadcast join — it occurs during the broadcast phase. OutOfMemoryError is a general Spark OOM that can occur at any execution stage.

If I disable broadcast, will the join still work?

Yes — disabling broadcast switches to Sort Merge Join, which shuffles data between executors. Slower for small datasets but handles large ones without memory issues.

What is a safe size threshold for broadcast joins?

Broadcast is safe under 100MB on standard Azure IR. Keep broadcast for small, slowly-changing dimension tables — disable it for transaction tables that grow over time.

Will downstream Power BI datasets be affected?

Yes — the pipeline fails and no data is written to the target. Dependent datasets serve stale figures.

Source · learn.microsoft.com/en-us/azure/data-factory/data-flow-troubleshoot-guide