What causes DRIVER_NOT_RESPONDING?

A long JVM garbage collection pause on the driver (common with large in-memory datasets). Heavy Python-side processing blocking the main thread (e.g., a large pandas operation on the driver). Collecting a very large dataset to the driver using collect() or toPandas(). Broadcasting an extremely large variable to all executors, blocking the driver. CPU starvation on the driver node due to excessive task scheduling overhead

How do I fix DRIVER_NOT_RESPONDING?

Check driver logs for GC pause duration — if pauses exceed 60 seconds, the driver may appear unresponsive. Avoid using collect() or toPandas() on large datasets — process data distributed on executors instead. Increase driver memory and GC settings: add '-Xmx' and '-XX:+UseG1GC' to spark.driver.extraJavaOptions. Break up large operations that run entirely on the driver into smaller distributed operations. Increase the heartbeat timeout threshold if occasional GC pauses are acceptable: set spark.network.timeout

High severitycluster

Power BI Refresh Error:
DRIVER_NOT_RESPONDING

What does this error mean?

The Databricks driver is running but has stopped responding to heartbeats from the control plane. Unlike COMMUNICATION_LOST, the driver process is still alive but is unresponsive — typically due to a long garbage collection pause or extreme CPU load.

Common causes

1A long JVM garbage collection pause on the driver (common with large in-memory datasets)
2Heavy Python-side processing blocking the main thread (e.g., a large pandas operation on the driver)
3Collecting a very large dataset to the driver using collect() or toPandas()
4Broadcasting an extremely large variable to all executors, blocking the driver
5CPU starvation on the driver node due to excessive task scheduling overhead

How to fix it

1Check driver logs for GC pause duration — if pauses exceed 60 seconds, the driver may appear unresponsive
2Avoid using collect() or toPandas() on large datasets — process data distributed on executors instead
3Increase driver memory and GC settings: add '-Xmx' and '-XX:+UseG1GC' to spark.driver.extraJavaOptions
4Break up large operations that run entirely on the driver into smaller distributed operations
5Increase the heartbeat timeout threshold if occasional GC pauses are acceptable: set spark.network.timeout

Frequently asked questions

Is DRIVER_NOT_RESPONDING the same as COMMUNICATION_LOST?

No — DRIVER_NOT_RESPONDING means the driver process is alive but frozen. COMMUNICATION_LOST means the control plane cannot reach the driver at all (often because the driver was killed).

What is the default heartbeat timeout in Databricks?

The default Spark network timeout is 120 seconds. You can increase it via spark.network.timeout in the cluster Spark configuration.