ADF sink configured as Insert-only instead of Upsert: the MySQL sink dataset defaults to insert behavior. When a pipeline retries after a partial load, rows already written in the first attempt are inserted again, producing 1062 on every duplicate primary key.. dbt incremental model missing unique_key: without a unique_key in the incremental config, dbt appends all source rows on each run. Any row that already exists in the target table triggers 1062, because dbt falls back to plain INSERT.. Source data contains upstream duplicates: the staging table or API response delivers the same logical record more than once (e.g., CDC events replayed, or a FULL load joined with a DELTA load). Without a DISTINCT or ROW_NUMBER deduplication step the load layer inserts every copy.. Race condition between concurrent pipeline runs: two ADF triggers or Fabric notebook runs start within seconds of each other, both read the same source snapshot, and both attempt to insert overlapping key ranges simultaneously. The second writer hits 1062 on rows the first already committed.. REPLACE INTO used instead of INSERT ... ON DUPLICATE KEY UPDATE when the table has foreign-key children: REPLACE deletes the old row and inserts a new one. If a child table references the old row, MySQL may error or cascade-delete before 1062 is reached, but on tables without FK cascades the pattern still produces duplicate-key errors when the sequence generator reuses IDs.. Auto-increment exhaustion or manual ID override: a script that explicitly sets the id column to a fixed value (e.g., for backfill) collides with rows inserted by the application layer that already claimed those IDs. Common when replaying historical data into a live table.. Schema migration added a UNIQUE index on an existing column that already contains duplicates: ALTER TABLE ... ADD UNIQUE fails with 1062 at migration time, but the same error appears at insert time if the migration succeeded but the underlying data was not cleaned first.

Step 1 — Read the error message precisely. `Duplicate entry '42' for key 'PRIMARY'` tells you the exact value and constraint name. Run `SHOW CREATE TABLE your_table;` to see which column(s) form that key, then query `SELECT id, COUNT(*) FROM your_table GROUP BY id HAVING COUNT(*) > 1;` to confirm whether duplicates already exist in the target.. Step 2 — Switch ADF sink to Upsert write behavior. In the ADF Copy Activity → Sink tab, set Write behavior to Upsert, then specify the key columns under Key columns. ADF will generate `INSERT ... ON DUPLICATE KEY UPDATE` for MySQL sinks, making every pipeline run idempotent.. Step 3 — For raw SQL or Stored Procedure activities use INSERT ... ON DUPLICATE KEY UPDATE explicitly: `INSERT INTO orders (id, status, updated_at) VALUES (42, 'shipped', NOW()) ON DUPLICATE KEY UPDATE status = VALUES(status), updated_at = VALUES(updated_at);` This is safe for retries because re-running the same values is a no-op.. Step 4 — In dbt, add unique_key to the incremental model config block: `{{ config(materialized='incremental', unique_key='order_id') }}`. The MySQL/MariaDB dbt adapter translates this into INSERT ... ON DUPLICATE KEY UPDATE behind the scenes. Validate with `dbt run --select your_model --full-refresh` on a dev target first.. Step 5 — Deduplicate source data before the load step. Use a CTE with ROW_NUMBER to keep only the latest record per key: `INSERT INTO target SELECT id, col1, col2 FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) AS rn FROM staging) t WHERE rn = 1 ON DUPLICATE KEY UPDATE col1 = VALUES(col1);`. Step 6 — If the error occurs during an ALTER TABLE ... ADD UNIQUE migration, clean duplicates first. Find and delete or merge them: `DELETE t1 FROM your_table t1 INNER JOIN your_table t2 WHERE t1.id > t2.id AND t1.email = t2.email;` Then re-run the migration.. Step 7 — For concurrent pipeline race conditions, serialize runs by adding a pipeline dependency or Fabric notebook mutex. In ADF use a trigger dependency or set the pipeline to allow only one active run at a time under Settings → Concurrency → Maximum concurrent runs = 1.

High severitydata integrity

MySQL Error:
1062, Unique Constraint Violation

Impact

Every retry of the failed pipeline run will hit the same error until the write behavior is fixed — partial loads leave the target table in an inconsistent state and downstream dbt models or Power BI datasets will read stale or incomplete data until the issue is resolved.

When 1062 fires mid-batch, the Copy Activity or Stored Procedure halts immediately and marks the activity as Failed. Rows written before the failing row remain committed — there is no automatic rollback unless you wrap the load in an explicit transaction. This leaves the target table partially loaded. Downstream dbt models that run on a schedule will read this incomplete state and may produce incorrect aggregations or silently drop rows. Power BI datasets refreshing on top of those models will show stale metrics. SLA breaches accumulate with each failed retry until upsert logic is in place.

What does this error mean?

MySQL error 1062 fires when an INSERT or UPDATE attempts to write a value that already exists in a column declared as UNIQUE or PRIMARY KEY. The database rejects the entire statement and rolls back that row. In data-pipeline context this typically surfaces during a pipeline retry: the first run partially loaded rows, the retry attempts to insert the same primary keys again, and every duplicate row causes a 1062. The engineer sees the pipeline marked as Failed with the message `Duplicate entry 'X' for key 'PRIMARY'` in ADF activity output or in the MySQL slow/error log. Until the root cause is fixed, every subsequent run will fail at the same row.

Common causes

1ADF sink configured as Insert-only instead of Upsert: the MySQL sink dataset defaults to insert behavior. When a pipeline retries after a partial load, rows already written in the first attempt are inserted again, producing 1062 on every duplicate primary key.
2dbt incremental model missing unique_key: without a unique_key in the incremental config, dbt appends all source rows on each run. Any row that already exists in the target table triggers 1062, because dbt falls back to plain INSERT.
3Source data contains upstream duplicates: the staging table or API response delivers the same logical record more than once (e.g., CDC events replayed, or a FULL load joined with a DELTA load). Without a DISTINCT or ROW_NUMBER deduplication step the load layer inserts every copy.
4Race condition between concurrent pipeline runs: two ADF triggers or Fabric notebook runs start within seconds of each other, both read the same source snapshot, and both attempt to insert overlapping key ranges simultaneously. The second writer hits 1062 on rows the first already committed.
5REPLACE INTO used instead of INSERT ... ON DUPLICATE KEY UPDATE when the table has foreign-key children: REPLACE deletes the old row and inserts a new one. If a child table references the old row, MySQL may error or cascade-delete before 1062 is reached, but on tables without FK cascades the pattern still produces duplicate-key errors when the sequence generator reuses IDs.
6Auto-increment exhaustion or manual ID override: a script that explicitly sets the id column to a fixed value (e.g., for backfill) collides with rows inserted by the application layer that already claimed those IDs. Common when replaying historical data into a live table.
7Schema migration added a UNIQUE index on an existing column that already contains duplicates: ALTER TABLE ... ADD UNIQUE fails with 1062 at migration time, but the same error appears at insert time if the migration succeeded but the underlying data was not cleaned first.

How to fix it

1Step 1 — Read the error message precisely. `Duplicate entry '42' for key 'PRIMARY'` tells you the exact value and constraint name. Run `SHOW CREATE TABLE your_table;` to see which column(s) form that key, then query `SELECT id, COUNT(*) FROM your_table GROUP BY id HAVING COUNT(*) > 1;` to confirm whether duplicates already exist in the target.
2Step 2 — Switch ADF sink to Upsert write behavior. In the ADF Copy Activity → Sink tab, set Write behavior to Upsert, then specify the key columns under Key columns. ADF will generate `INSERT ... ON DUPLICATE KEY UPDATE` for MySQL sinks, making every pipeline run idempotent.
3Step 3 — For raw SQL or Stored Procedure activities use INSERT ... ON DUPLICATE KEY UPDATE explicitly: `INSERT INTO orders (id, status, updated_at) VALUES (42, 'shipped', NOW()) ON DUPLICATE KEY UPDATE status = VALUES(status), updated_at = VALUES(updated_at);` This is safe for retries because re-running the same values is a no-op.
4Step 4 — In dbt, add unique_key to the incremental model config block: `{{ config(materialized='incremental', unique_key='order_id') }}`. The MySQL/MariaDB dbt adapter translates this into INSERT ... ON DUPLICATE KEY UPDATE behind the scenes. Validate with `dbt run --select your_model --full-refresh` on a dev target first.
5Step 5 — Deduplicate source data before the load step. Use a CTE with ROW_NUMBER to keep only the latest record per key: `INSERT INTO target SELECT id, col1, col2 FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) AS rn FROM staging) t WHERE rn = 1 ON DUPLICATE KEY UPDATE col1 = VALUES(col1);`
6Step 6 — If the error occurs during an ALTER TABLE ... ADD UNIQUE migration, clean duplicates first. Find and delete or merge them: `DELETE t1 FROM your_table t1 INNER JOIN your_table t2 WHERE t1.id > t2.id AND t1.email = t2.email;` Then re-run the migration.
7Step 7 — For concurrent pipeline race conditions, serialize runs by adding a pipeline dependency or Fabric notebook mutex. In ADF use a trigger dependency or set the pipeline to allow only one active run at a time under Settings → Concurrency → Maximum concurrent runs = 1.

Example log output

com.microsoft.sqlserver.jdbc.SQLServerException: Duplicate entry '10047' for key 'orders.PRIMARY'
Activity 'Copy_orders_to_mysql' failed: ErrorCode=UserErrorDatabaseOperationFailed, Type=Microsoft.DataFactory.Common.ActivityException
[MySQL][ODBC 8.0(w) Driver][mysqld-8.0.32]Duplicate entry '10047' for key 'orders.PRIMARY'

Frequently asked questions

MySQL 1062 fix — how do I make my ADF pipeline idempotent?

In the Copy Activity Sink tab set Write behavior to Upsert and specify the key columns. For Stored Procedure activities, replace INSERT with INSERT ... ON DUPLICATE KEY UPDATE. Both approaches let you re-run the pipeline safely after a failure without manual cleanup.

MySQL 1062 retry — why does my pipeline keep failing on every retry?

Because the first partial run already wrote some rows to the target. Each retry attempts plain INSERTs again, hitting 1062 on every row that was committed in the previous attempt. The fix is to switch to upsert semantics so that re-inserting an existing key updates it instead of erroring.

dbt incremental model duplicate entry 1062 — what is the correct config?

Add `unique_key='your_primary_key_column'` to the `{{ config(...) }}` block. Without it, dbt appends all source rows on every run. With unique_key set, the MySQL adapter generates INSERT ... ON DUPLICATE KEY UPDATE, preventing duplicates on incremental runs.

How do I find which rows caused the MySQL 1062 error before deleting them?

Run `SELECT id, COUNT(*) c FROM your_table GROUP BY id HAVING c > 1 ORDER BY c DESC LIMIT 20;` replacing id with the column named in the error message. This shows you the duplicate values and how many copies exist, so you can decide whether to merge or drop the extras before re-running the load.

Source · dev.mysql.com/doc/mysql-errors/8.0/en/server-error-reference.html