MetricSign
EN|NLRequest Access
Medium severityinfrastructure

Power BI Refresh Error:
DF-Executor-OutOfMemorySparkError

What does this error mean?

A Spark executor process ran out of Java heap memory during data flow execution. Unlike the general OutOfMemoryError, this error originates specifically in the Spark executor layer — the JVM running on the worker node raised an OutOfMemoryError, crashing the executor task.

Common causes

  • 1The Azure IR compute size is too small for the data volume — the default 8-core general-purpose IR has limited heap per executor
  • 2A wide transformation (many columns, complex expressions) increases per-row memory cost and exhausts executor heap on large inputs
  • 3A join or aggregation generates an intermediate result that does not fit in executor memory — data is not partitioned finely enough
  • 4Multiple memory-heavy transformations in sequence (joins, pivots, aggregations) compound heap pressure without intermediate checkpoints

How to fix it

  1. 1Increase the Azure IR compute size — select a larger core count or use memory-optimized compute in the data flow activity Settings tab.
  2. 2Add source filters to reduce the volume of data processed in a single run.
  3. 3Disable all broadcast hints in join transformations to prevent Spark from attempting large in-memory broadcasts.
  4. 4Enable data flow debug mode with a small row count to identify which transformation step is consuming the most memory.
  5. 5Consider partitioning the data flow into multiple runs processing date or key ranges in sequence to keep per-run memory within limits.

Frequently asked questions

What is the difference between OutOfMemorySparkError and OutOfMemoryError?

OutOfMemorySparkError is a JVM heap OOM on the Spark executor. OutOfMemoryError is broader — it can include off-heap storage memory exhaustion or driver memory issues. Both require IR sizing; OutOfMemorySparkError specifically points at executor heap.

How do I know which transformation is consuming the most memory?

Enable debug mode and run with a small sample. The Spark UI link in ADF monitoring shows per-stage memory usage and shuffle spill. Transformations after a join or aggregation that retain all columns are common culprits.

Will increasing IR cores fix this, or do I need memory-optimized compute?

More cores give more executors and smaller per-partition slices — better for source-heavy flows. Memory-optimized compute increases heap per executor — better for join-heavy or aggregation-heavy flows.

Will downstream Power BI datasets be affected?

Yes — the pipeline fails and no data is written to the target. Dependent datasets serve stale figures until the pipeline completes successfully.

Official documentation: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-troubleshoot-guide

Other infrastructure errors