MetricSign
EN|NLRequest Access
Medium severitydata quality

Power BI Refresh Error:
MALFORMED_RECORD_IN_PARSING

What does this error mean?

The JSON or CSV parser encountered a record that does not conform to the expected schema or format. In FAILFAST mode, Databricks raises this error immediately; in PERMISSIVE mode, the bad record is replaced with NULL values.

Common causes

  • 1A JSON field contains a value that cannot be cast to the declared schema type (e.g. a string where an integer is expected)
  • 2A CSV row has a different number of fields than the header row
  • 3A multi-line JSON record is split across lines and read_csv or spark.read.json is used without multiLine option
  • 4An upstream system changed its output format without updating the reader schema
  • 5A file is partially corrupted or truncated mid-record

How to fix it

  1. 1Switch mode to PERMISSIVE: spark.read.option('mode', 'PERMISSIVE') and inspect the _corrupt_record column.
  2. 2Use DROPMALFORMED to silently skip bad records, then investigate them separately from the _corrupt_record output.
  3. 3Validate schema compatibility between the reader's schema and a sample of new incoming files before processing.
  4. 4For multi-line JSON, add .option('multiLine', 'true') to the reader.
  5. 5Add a schema validation step upstream (e.g. Great Expectations or Delta Live Tables expectations) to reject malformed files before the pipeline runs.

Frequently asked questions

How do I find out which records are malformed?

Read with mode PERMISSIVE and include the _corrupt_record column in your schema. Filter WHERE _corrupt_record IS NOT NULL to extract all bad rows for inspection.

Does Auto Loader handle MALFORMED_RECORD_IN_PARSING differently?

Auto Loader uses the same Spark reader options. Set cloudFiles.schemaEvolutionMode to rescue to route unexpected columns and malformed values to a _rescued_data column instead of failing.

Other data quality errors