Databricks Error:
OUT_OF_MEMORY
What does this error mean?
The Databricks cluster ran out of memory while executing a query. The query was consuming more memory than the cluster has available, causing the executor or driver to be killed.
Common causes
- 1A cartesian join (cross join) producing an unexpectedly large intermediate result
- 2A shuffle operation creating too many small partitions or one very large partition (data skew)
- 3Collecting a large dataset to the driver with collect() or toPandas()
- 4Broadcasting a large table that exceeds the broadcast threshold
- 5Processing significantly more data than usual due to upstream data growth
How to fix it
- 1Identify the query step that is consuming the most memory using the Spark UI (look for the stage with the highest memory spill).
- 2Check for data skew — if one partition is much larger than others, use salting or repartition.
- 3Replace collect() with distributed aggregations and only pull summary results to the driver.
- 4Increase the number of shuffle partitions (spark.sql.shuffle.partitions) to reduce per-partition size.
- 5Increase the cluster size or switch to a memory-optimized instance type.
- 6Enable Spark memory spill to disk as a safety valve (at the cost of performance).
Example log output
org.apache.spark.SparkException: Job aborted due to stage failure: Task 14 in stage 3.0 failed 4 times. Most recent failure: Lost task 14.3 in stage 3.0 (TID 612, 10.0.1.8, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 28.2 GB of 28 GB physical memory used.
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2844)
SQLSTATE: 53000