High severitystreaming
Power BI Refresh Error:
CHECKPOINT_CORRUPTED
What does this error mean?
The Structured Streaming checkpoint directory contains incomplete or inconsistent metadata, preventing the query from resuming from its last committed offset. Databricks raises this error rather than risk processing data out of order or skipping records.
Common causes
- 1The checkpoint write was interrupted mid-flight by a cluster termination or spot-instance preemption
- 2The checkpoint location is on object storage (S3/ADLS) and a partial write was not atomically committed
- 3The checkpoint directory was manually modified or partially deleted
- 4A concurrent streaming job wrote to the same checkpoint path
- 5File system permissions changed after the checkpoint was created, leaving it partially readable
How to fix it
- 1Back up the checkpoint directory before making any changes.
- 2Inspect the checkpoint/commits and checkpoint/offsets directories for missing or zero-byte files.
- 3If the checkpoint is unrecoverable, delete the entire checkpoint directory and restart the query — it will reprocess from the configured starting offset.
- 4Set the starting offset to latest if full reprocessing is not required: .option('startingOffsets', 'latest').
- 5Switch to a more reliable checkpoint location such as DBFS or Azure Data Lake Storage Gen2 with hierarchical namespace enabled for atomic renames.