MetricSign
EN|NLRequest Access
High severitysql

Power BI Refresh Error:
OUT_OF_MEMORY

What does this error mean?

The Databricks cluster ran out of memory while executing a query. The query was consuming more memory than the cluster has available, causing the executor or driver to be killed.

Common causes

  • 1A cartesian join (cross join) producing an unexpectedly large intermediate result
  • 2A shuffle operation creating too many small partitions or one very large partition (data skew)
  • 3Collecting a large dataset to the driver with collect() or toPandas()
  • 4Broadcasting a large table that exceeds the broadcast threshold
  • 5Processing significantly more data than usual due to upstream data growth

How to fix it

  1. 1Identify the query step that is consuming the most memory using the Spark UI (look for the stage with the highest memory spill)
  2. 2Check for data skew — if one partition is much larger than others, use salting or repartition
  3. 3Replace collect() with distributed aggregations and only pull summary results to the driver
  4. 4Increase the number of shuffle partitions (spark.sql.shuffle.partitions) to reduce per-partition size
  5. 5Increase the cluster size or switch to a memory-optimized instance type
  6. 6Enable Spark memory spill to disk as a safety valve (at the cost of performance)

Frequently asked questions

Can I fix OOM without increasing cluster size?

Often yes — query optimization (fixing skew, avoiding cartesian joins, reducing collect() usage) can dramatically reduce memory usage without adding nodes.

What is data skew and how do I detect it?

Data skew occurs when one partition has much more data than others. You can detect it in the Spark UI by looking for a stage where one task takes much longer than the others.

Other sql errors