What causes DF-Executor-OutOfMemoryError?

The data flow transformation is processing more data than the cluster memory allows — each executor node does not have enough JVM heap for its partition of the data. A broadcast join is caching a large dataset in every executor's memory simultaneously. The Azure IR compute size is insufficient for the current data volume — the pipeline has grown beyond the original sizing. A Window or sort transformation requires Spark to buffer an entire partition in memory, which exceeds the executor's heap when partitions are too large

How do I fix DF-Executor-OutOfMemoryError?

Increase the Azure IR compute type — in the data flow activity Settings tab, switch to a larger core count or select the memory-optimized compute type.. Disable broadcast hints on join transformations — in each Join transformation, set Broadcast to 'Off' to prevent large datasets from being cached in memory.. Reduce data volume by adding filter conditions at the source transformation to process only the necessary rows.. Split complex transformations into multiple sequential data flows using intermediate staging storage to reduce per-run memory usage.. Enable debug mode with a row limit of 100–1000 rows to identify which transformation step causes the OOM and optimize it.

Medium severitydata flowAzure Data Factory →

ADF Pipeline Error:
DF-Executor-OutOfMemoryError

What does this error mean?

The Spark cluster ran out of heap memory during data flow execution. Executor nodes or the driver exceeded their available JVM heap, causing an OutOfMemoryError.

Common causes

1The data flow transformation is processing more data than the cluster memory allows — each executor node does not have enough JVM heap for its partition of the data
2A broadcast join is caching a large dataset in every executor's memory simultaneously
3The Azure IR compute size is insufficient for the current data volume — the pipeline has grown beyond the original sizing
4A Window or sort transformation requires Spark to buffer an entire partition in memory, which exceeds the executor's heap when partitions are too large

How to fix it

1Increase the Azure IR compute type — in the data flow activity Settings tab, switch to a larger core count or select the memory-optimized compute type.
2Disable broadcast hints on join transformations — in each Join transformation, set Broadcast to 'Off' to prevent large datasets from being cached in memory.
3Reduce data volume by adding filter conditions at the source transformation to process only the necessary rows.
4Split complex transformations into multiple sequential data flows using intermediate staging storage to reduce per-run memory usage.
5Enable debug mode with a row limit of 100–1000 rows to identify which transformation step causes the OOM and optimize it.

Frequently asked questions

What Azure IR compute type should I use for large data volumes?

Use memory-optimized compute (data flow Settings tab) for large volumes, complex joins, or wide schemas — it provides more JVM heap per core than general purpose compute, at higher cost.

How do broadcast joins contribute to OOM errors?

A broadcast join copies one side to every executor — with a 500MB dataset and 10 executors, each needs 500MB for the broadcast plus its own partition, easily exhausting executor heap.

How do I find which transformation is causing the OOM?

Enable debug mode with a 1000-row limit and step through transformations with Data Preview. OOM typically occurs at the first transformation that processes a full dataset — join, sort, or aggregate.

Will downstream Power BI datasets be affected?

Yes — when a Spark executor crashes with OOM, the entire data flow job is aborted and no data is written to the sink. Dependent datasets serve stale figures.

Source · learn.microsoft.com/en-us/azure/data-factory/data-flow-troubleshoot-guide