Medium severitydata quality
Power BI Refresh Error:
MALFORMED_RECORD_IN_PARSING
What does this error mean?
The JSON or CSV parser encountered a record that does not conform to the expected schema or format. In FAILFAST mode, Databricks raises this error immediately; in PERMISSIVE mode, the bad record is replaced with NULL values.
Common causes
- 1A JSON field contains a value that cannot be cast to the declared schema type (e.g. a string where an integer is expected)
- 2A CSV row has a different number of fields than the header row
- 3A multi-line JSON record is split across lines and read_csv or spark.read.json is used without multiLine option
- 4An upstream system changed its output format without updating the reader schema
- 5A file is partially corrupted or truncated mid-record
How to fix it
- 1Switch mode to PERMISSIVE: spark.read.option('mode', 'PERMISSIVE') and inspect the _corrupt_record column.
- 2Use DROPMALFORMED to silently skip bad records, then investigate them separately from the _corrupt_record output.
- 3Validate schema compatibility between the reader's schema and a sample of new incoming files before processing.
- 4For multi-line JSON, add .option('multiLine', 'true') to the reader.
- 5Add a schema validation step upstream (e.g. Great Expectations or Delta Live Tables expectations) to reject malformed files before the pipeline runs.