High severitydata quality
Power BI Refresh Error:
BAD_FILE_FORMAT
What does this error mean?
Databricks could not read a file because its actual format does not match the declared format used in the read operation. A Parquet reader receiving a CSV file, or a Delta reader pointed at plain JSON files, will raise this error.
Common causes
- 1The file extension and actual format do not match (e.g. a .parquet file that is actually gzip-compressed CSV)
- 2An upstream ETL job wrote files in the wrong format to a location that Databricks reads as a specific format
- 3A Parquet or ORC file was corrupted during write (incomplete footer or missing schema block)
- 4A Delta table location contains non-Delta files that were placed there manually
- 5Auto Loader inferred the format incorrectly during initial schema inference
How to fix it
- 1Verify the actual format of a suspect file: file <filename> on Linux, or download and inspect the header bytes.
- 2Re-specify the format explicitly in the reader: spark.read.format('parquet').load(path).
- 3Remove or quarantine corrupted files from the source path before re-running the job.
- 4For Auto Loader, set cloudFiles.format explicitly instead of relying on inference.
- 5If a Delta table contains stray non-Delta files, use VACUUM and clean up the directory before reading.