MetricSign
EN|NLRequest Access
High severitycluster

Power BI Refresh Error:
COMMUNICATION_LOST

What does this error mean?

The Databricks control plane lost communication with the cluster. The cluster was running but became unreachable — this can be caused by network issues, the driver VM crashing, or the OS being killed by the Linux OOM killer.

Common causes

  • 1The driver node was killed by the Linux OOM (out-of-memory) killer due to memory pressure
  • 2A network partition between the cluster and the Databricks control plane
  • 3The driver VM crashed due to hardware failure on the underlying host
  • 4A kernel panic or OS-level crash on the driver node
  • 5A spot instance was terminated without a graceful shutdown signal

How to fix it

  1. 1Check driver logs for OOM killer messages (look for 'Out of memory: Kill process' in syslog)
  2. 2Review memory usage metrics for the cluster — if the driver was consistently at high memory, increase the driver node size
  3. 3Check cloud provider health for the availability zone to rule out infrastructure issues
  4. 4Add driver memory overhead configuration if running large Spark collect() operations or broadcasting large DataFrames
  5. 5Consider using a dedicated driver node type with more memory for memory-intensive workloads

Frequently asked questions

How do I know if it was an OOM kill?

Check the Ganglia metrics on the cluster for memory usage just before the failure, and look for 'oom' in the driver system logs.

Can I prevent this with auto-scaling?

Auto-scaling adds executor nodes but does not help with driver memory issues. The driver always runs on a single node — you need a larger driver node type.

Other cluster errors