High severitydata integrity
MySQL Error:
1062, Unique Constraint Violation
What does this error mean?
MySQL error 1062 fires when an INSERT or UPDATE attempts to write a value that already exists in a column declared as UNIQUE or PRIMARY KEY. The database rejects the entire statement and rolls back that row. In data-pipeline context this typically surfaces during a pipeline retry: the first run partially loaded rows, the retry attempts to insert the same primary keys again, and every duplicate row causes a 1062. The engineer sees the pipeline marked as Failed with the message `Duplicate entry 'X' for key 'PRIMARY'` in ADF activity output or in the MySQL slow/error log. Until the root cause is fixed, every subsequent run will fail at the same row.
Common causes
- 1ADF sink configured as Insert-only instead of Upsert: the MySQL sink dataset defaults to insert behavior. When a pipeline retries after a partial load, rows already written in the first attempt are inserted again, producing 1062 on every duplicate primary key.
- 2dbt incremental model missing unique_key: without a unique_key in the incremental config, dbt appends all source rows on each run. Any row that already exists in the target table triggers 1062, because dbt falls back to plain INSERT.
- 3Source data contains upstream duplicates: the staging table or API response delivers the same logical record more than once (e.g., CDC events replayed, or a FULL load joined with a DELTA load). Without a DISTINCT or ROW_NUMBER deduplication step the load layer inserts every copy.
- 4Race condition between concurrent pipeline runs: two ADF triggers or Fabric notebook runs start within seconds of each other, both read the same source snapshot, and both attempt to insert overlapping key ranges simultaneously. The second writer hits 1062 on rows the first already committed.
- 5REPLACE INTO used instead of INSERT ... ON DUPLICATE KEY UPDATE when the table has foreign-key children: REPLACE deletes the old row and inserts a new one. If a child table references the old row, MySQL may error or cascade-delete before 1062 is reached, but on tables without FK cascades the pattern still produces duplicate-key errors when the sequence generator reuses IDs.
- 6Auto-increment exhaustion or manual ID override: a script that explicitly sets the id column to a fixed value (e.g., for backfill) collides with rows inserted by the application layer that already claimed those IDs. Common when replaying historical data into a live table.
- 7Schema migration added a UNIQUE index on an existing column that already contains duplicates: ALTER TABLE ... ADD UNIQUE fails with 1062 at migration time, but the same error appears at insert time if the migration succeeded but the underlying data was not cleaned first.
How to fix it
- 1Step 1 — Read the error message precisely. `Duplicate entry '42' for key 'PRIMARY'` tells you the exact value and constraint name. Run `SHOW CREATE TABLE your_table;` to see which column(s) form that key, then query `SELECT id, COUNT(*) FROM your_table GROUP BY id HAVING COUNT(*) > 1;` to confirm whether duplicates already exist in the target.
- 2Step 2 — Switch ADF sink to Upsert write behavior. In the ADF Copy Activity → Sink tab, set Write behavior to Upsert, then specify the key columns under Key columns. ADF will generate `INSERT ... ON DUPLICATE KEY UPDATE` for MySQL sinks, making every pipeline run idempotent.
- 3Step 3 — For raw SQL or Stored Procedure activities use INSERT ... ON DUPLICATE KEY UPDATE explicitly: `INSERT INTO orders (id, status, updated_at) VALUES (42, 'shipped', NOW()) ON DUPLICATE KEY UPDATE status = VALUES(status), updated_at = VALUES(updated_at);` This is safe for retries because re-running the same values is a no-op.
- 4Step 4 — In dbt, add unique_key to the incremental model config block: `{{ config(materialized='incremental', unique_key='order_id') }}`. The MySQL/MariaDB dbt adapter translates this into INSERT ... ON DUPLICATE KEY UPDATE behind the scenes. Validate with `dbt run --select your_model --full-refresh` on a dev target first.
- 5Step 5 — Deduplicate source data before the load step. Use a CTE with ROW_NUMBER to keep only the latest record per key: `INSERT INTO target SELECT id, col1, col2 FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) AS rn FROM staging) t WHERE rn = 1 ON DUPLICATE KEY UPDATE col1 = VALUES(col1);`
- 6Step 6 — If the error occurs during an ALTER TABLE ... ADD UNIQUE migration, clean duplicates first. Find and delete or merge them: `DELETE t1 FROM your_table t1 INNER JOIN your_table t2 WHERE t1.id > t2.id AND t1.email = t2.email;` Then re-run the migration.
- 7Step 7 — For concurrent pipeline race conditions, serialize runs by adding a pipeline dependency or Fabric notebook mutex. In ADF use a trigger dependency or set the pipeline to allow only one active run at a time under Settings → Concurrency → Maximum concurrent runs = 1.
Example log output
com.microsoft.sqlserver.jdbc.SQLServerException: Duplicate entry '10047' for key 'orders.PRIMARY'
Activity 'Copy_orders_to_mysql' failed: ErrorCode=UserErrorDatabaseOperationFailed, Type=Microsoft.DataFactory.Common.ActivityException
[MySQL][ODBC 8.0(w) Driver][mysqld-8.0.32]Duplicate entry '10047' for key 'orders.PRIMARY'