High severitysql
Power BI Refresh Error:
OUT_OF_MEMORY
What does this error mean?
The Databricks cluster ran out of memory while executing a query. The query was consuming more memory than the cluster has available, causing the executor or driver to be killed.
Common causes
- 1A cartesian join (cross join) producing an unexpectedly large intermediate result
- 2A shuffle operation creating too many small partitions or one very large partition (data skew)
- 3Collecting a large dataset to the driver with collect() or toPandas()
- 4Broadcasting a large table that exceeds the broadcast threshold
- 5Processing significantly more data than usual due to upstream data growth
How to fix it
- 1Identify the query step that is consuming the most memory using the Spark UI (look for the stage with the highest memory spill)
- 2Check for data skew — if one partition is much larger than others, use salting or repartition
- 3Replace collect() with distributed aggregations and only pull summary results to the driver
- 4Increase the number of shuffle partitions (spark.sql.shuffle.partitions) to reduce per-partition size
- 5Increase the cluster size or switch to a memory-optimized instance type
- 6Enable Spark memory spill to disk as a safety valve (at the cost of performance)