Medium severitydata flow
Power BI Refresh Error:
DF-Executor-OutOfMemoryError
What does this error mean?
The Spark cluster ran out of heap memory during data flow execution. Executor nodes or the driver exceeded their available JVM heap, causing an OutOfMemoryError.
Common causes
- 1The data flow transformation is processing more data than the cluster memory allows — each executor node does not have enough JVM heap for its partition of the data
- 2A broadcast join is caching a large dataset in every executor's memory simultaneously
- 3The Azure IR compute size is insufficient for the current data volume — the pipeline has grown beyond the original sizing
- 4A Window or sort transformation requires Spark to buffer an entire partition in memory, which exceeds the executor's heap when partitions are too large
How to fix it
- 1Increase the Azure IR compute type — in the data flow activity Settings tab, switch to a larger core count or select the memory-optimized compute type.
- 2Disable broadcast hints on join transformations — in each Join transformation, set Broadcast to 'Off' to prevent large datasets from being cached in memory.
- 3Reduce data volume by adding filter conditions at the source transformation to process only the necessary rows.
- 4Split complex transformations into multiple sequential data flows using intermediate staging storage to reduce per-run memory usage.
- 5Enable debug mode with a row limit of 100–1000 rows to identify which transformation step causes the OOM and optimize it.
Frequently asked questions
Official documentation: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-troubleshoot-guide