What causes DF-Executor-OutOfDiskSpaceError?

The data volume being processed per partition exceeds the local disk capacity available per executor node in the Azure IR. A large shuffle operation (sort, group by, join) is spilling much more data to disk than the node's temporary storage can accommodate. Multiple concurrent data flows are running on the same IR, competing for the limited local disk on each node. Intermediate files from a previous failed run were not cleaned up and are consuming disk space alongside the current run

How do I fix DF-Executor-OutOfDiskSpaceError?

The Azure IR node running the Spark executor has run out of local disk space for shuffle or spill files — increase the IR compute type to use nodes with more local storage.. Reduce the amount of data being processed in a single run by adding source filters or partitioning the pipeline to process smaller batches.. Disable caching transformations that are writing intermediate data to local disk, or move them to use in-memory cache only for smaller datasets.. Check whether any intermediate files from previous failed runs are accumulating on the IR nodes and consuming disk — retry with a fresh cluster session.. Increase partition count in the data flow to reduce the per-partition data size being spilled to disk during shuffle operations.

Medium severitydata flowAzure Data Factory →

ADF Pipeline Error:
DF-Executor-OutOfDiskSpaceError

What does this error mean?

The Spark executor ran out of local disk space on the Azure IR compute node. Data flows use local disk for shuffle spill files when in-memory buffers are exhausted — if the spilled data volume exceeds the local disk capacity, the executor fails with an out-of-disk error.

Common causes

1The data volume being processed per partition exceeds the local disk capacity available per executor node in the Azure IR
2A large shuffle operation (sort, group by, join) is spilling much more data to disk than the node's temporary storage can accommodate
3Multiple concurrent data flows are running on the same IR, competing for the limited local disk on each node
4Intermediate files from a previous failed run were not cleaned up and are consuming disk space alongside the current run

How to fix it

1The Azure IR node running the Spark executor has run out of local disk space for shuffle or spill files — increase the IR compute type to use nodes with more local storage.
2Reduce the amount of data being processed in a single run by adding source filters or partitioning the pipeline to process smaller batches.
3Disable caching transformations that are writing intermediate data to local disk, or move them to use in-memory cache only for smaller datasets.
4Check whether any intermediate files from previous failed runs are accumulating on the IR nodes and consuming disk — retry with a fresh cluster session.
5Increase partition count in the data flow to reduce the per-partition data size being spilled to disk during shuffle operations.

Frequently asked questions

What is shuffle spill and why does it consume disk?

During shuffle (sort, group by, join), Spark redistributes data across executors. Partitions that don't fit in memory spill to local disk — large shuffles on small IR nodes produce large spill files.

How does increasing partition count help with disk space?

More partitions means smaller spill files per partition. Start by doubling the partition count (Optimize tab) and monitor whether disk usage drops below capacity. Too many partitions add overhead.

Is OutOfDiskSpace the same as OutOfMemory?

No — OOM means Spark can't allocate heap memory. OutOfDiskSpace means shuffle spill files filled the executor's local disk. OOM needs more memory; OutOfDiskSpace needs more disk or fewer data per partition.

Will downstream Power BI datasets be affected?

Yes — the pipeline fails and the target table receives no new data. Dependent datasets serve stale figures.

Source · learn.microsoft.com/en-us/azure/data-factory/data-flow-troubleshoot-guide