High severityexecution
Power BI Refresh Error:
MLFLOW_RUN_FAILED
What does this error mean?
An MLflow experiment run was terminated with a FAILED status, meaning the training or evaluation job did not complete successfully.
Common causes
- 1A Python exception in the training script terminated the run before it could log final metrics
- 2The cluster ran out of memory during model training on large datasets
- 3A dependency package version conflict caused an import error at startup
- 4The MLflow artifact storage location (S3/ADLS/GCS) is inaccessible or has insufficient permissions
How to fix it
- 1Step 1: Open the MLflow experiment in the Databricks UI, click the failed run, and check the 'System Metrics' and 'Tags' tabs for the error message.
- 2Step 2: Review the cluster driver logs for the full Python stack trace.
- 3Step 3: If memory-related, increase the cluster size or reduce the batch size / dataset sample.
- 4Step 4: If artifact storage fails, verify the storage account permissions for the MLflow artifact URI.
- 5Step 5: Re-run the experiment after fixing the root cause — MLflow run IDs are immutable, so a new run is always created.