MetricSign
EN|NLRequest Access
High severitycluster

Power BI Refresh Error:
DRIVER_NOT_RESPONDING

What does this error mean?

The Databricks driver is running but has stopped responding to heartbeats from the control plane. Unlike COMMUNICATION_LOST, the driver process is still alive but is unresponsive — typically due to a long garbage collection pause or extreme CPU load.

Common causes

  • 1A long JVM garbage collection pause on the driver (common with large in-memory datasets)
  • 2Heavy Python-side processing blocking the main thread (e.g., a large pandas operation on the driver)
  • 3Collecting a very large dataset to the driver using collect() or toPandas()
  • 4Broadcasting an extremely large variable to all executors, blocking the driver
  • 5CPU starvation on the driver node due to excessive task scheduling overhead

How to fix it

  1. 1Check driver logs for GC pause duration — if pauses exceed 60 seconds, the driver may appear unresponsive
  2. 2Avoid using collect() or toPandas() on large datasets — process data distributed on executors instead
  3. 3Increase driver memory and GC settings: add '-Xmx' and '-XX:+UseG1GC' to spark.driver.extraJavaOptions
  4. 4Break up large operations that run entirely on the driver into smaller distributed operations
  5. 5Increase the heartbeat timeout threshold if occasional GC pauses are acceptable: set spark.network.timeout

Frequently asked questions

Is DRIVER_NOT_RESPONDING the same as COMMUNICATION_LOST?

No — DRIVER_NOT_RESPONDING means the driver process is alive but frozen. COMMUNICATION_LOST means the control plane cannot reach the driver at all (often because the driver was killed).

What is the default heartbeat timeout in Databricks?

The default Spark network timeout is 120 seconds. You can increase it via spark.network.timeout in the cluster Spark configuration.

Other cluster errors