MetricSign
EN|NLRequest Access
High severitydata quality

Power BI Refresh Error:
BAD_FILE_FORMAT

What does this error mean?

Databricks could not read a file because its actual format does not match the declared format used in the read operation. A Parquet reader receiving a CSV file, or a Delta reader pointed at plain JSON files, will raise this error.

Common causes

  • 1The file extension and actual format do not match (e.g. a .parquet file that is actually gzip-compressed CSV)
  • 2An upstream ETL job wrote files in the wrong format to a location that Databricks reads as a specific format
  • 3A Parquet or ORC file was corrupted during write (incomplete footer or missing schema block)
  • 4A Delta table location contains non-Delta files that were placed there manually
  • 5Auto Loader inferred the format incorrectly during initial schema inference

How to fix it

  1. 1Verify the actual format of a suspect file: file <filename> on Linux, or download and inspect the header bytes.
  2. 2Re-specify the format explicitly in the reader: spark.read.format('parquet').load(path).
  3. 3Remove or quarantine corrupted files from the source path before re-running the job.
  4. 4For Auto Loader, set cloudFiles.format explicitly instead of relying on inference.
  5. 5If a Delta table contains stray non-Delta files, use VACUUM and clean up the directory before reading.

Frequently asked questions

How do I detect format mismatches before they fail a job?

Use Auto Loader with cloudFiles.validateOptions=true to surface format mismatches at job start. Alternatively, add a schema validation notebook that reads the first file in each batch before the main pipeline runs.

Can a corrupted Parquet footer cause BAD_FILE_FORMAT?

Yes. A Parquet file with a missing or truncated footer looks like an unrecognizable format to the Parquet reader. Re-write the file from source or restore from a backup.

Other data quality errors