High severitycluster
Power BI Refresh Error:
DRIVER_NOT_RESPONDING
What does this error mean?
The Databricks driver is running but has stopped responding to heartbeats from the control plane. Unlike COMMUNICATION_LOST, the driver process is still alive but is unresponsive — typically due to a long garbage collection pause or extreme CPU load.
Common causes
- 1A long JVM garbage collection pause on the driver (common with large in-memory datasets)
- 2Heavy Python-side processing blocking the main thread (e.g., a large pandas operation on the driver)
- 3Collecting a very large dataset to the driver using collect() or toPandas()
- 4Broadcasting an extremely large variable to all executors, blocking the driver
- 5CPU starvation on the driver node due to excessive task scheduling overhead
How to fix it
- 1Check driver logs for GC pause duration — if pauses exceed 60 seconds, the driver may appear unresponsive
- 2Avoid using collect() or toPandas() on large datasets — process data distributed on executors instead
- 3Increase driver memory and GC settings: add '-Xmx' and '-XX:+UseG1GC' to spark.driver.extraJavaOptions
- 4Break up large operations that run entirely on the driver into smaller distributed operations
- 5Increase the heartbeat timeout threshold if occasional GC pauses are acceptable: set spark.network.timeout