Medium severitydata integrity
SQL Server Error:
2627
What does this error mean?
SQL Server error 2627 is raised when an INSERT or UPDATE tries to store a value that already exists in a column protected by a PRIMARY KEY or UNIQUE constraint. The engine rejects the entire statement and rolls back the affected row. In a data-pipeline context this typically surfaces when a copy activity runs without idempotency guards — a second pipeline execution attempts to re-insert rows that were already loaded in a previous run. The engineer sees the pipeline fail at the sink step, the error message includes the constraint name and the offending duplicate key value, and the target table is left in whatever partial state it was in at the point of failure.
Common causes
- 1An ADF pipeline was triggered twice — manually or via a duplicate schedule — and the copy activity uses 'Insert' write behavior without a deduplication check. The second run attempts to insert the same primary key values that were already written in the first run.
- 2Source data contains duplicate key values before they reach the target. A staging layer or upstream extraction did not deduplicate, so multiple rows carry the same business key and only the first one inserts successfully.
- 3A dbt incremental model is configured with `strategy: append` instead of `strategy: merge`. On each run it appends all records from the source rather than matching on the unique key and updating existing rows.
- 4An ADF upsert is configured but the key columns in the sink settings do not match the actual primary key of the target table. The MERGE statement generated by ADF cannot match existing rows, so every row falls into the INSERT branch and collides with existing data.
- 5A pipeline was retried after a partial failure mid-batch. The first attempt wrote a subset of rows before failing; the retry starts from the beginning and tries to re-insert rows that were already committed because the source query has no checkpoint or watermark.
- 6A TRUNCATE + INSERT pattern was replaced with a pure INSERT to avoid locking, but the TRUNCATE step was removed without adding duplicate-safe write logic. Rows accumulate across runs until a primary key collision occurs.
- 7Multiple pipelines write to the same target table concurrently — for example a full-load job and a delta job running in overlapping windows. Both read the same source rows and race to insert them, causing one to hit a duplicate key error.
How to fix it
- 1Step 1 — Identify the duplicate key. The error message contains the constraint name and the offending value. Confirm which rows are duplicated: SELECT <key_columns>, COUNT(*) AS cnt FROM <target_table> GROUP BY <key_columns> HAVING COUNT(*) > 1; Note the key value from the error message and cross-reference it with the source query.
- 2Step 2 — For ADF Copy Activity: open the sink tab, change 'Write behavior' from 'Insert' to 'Upsert', and add the key columns (e.g. id or composite business key) in the 'Key columns' field. ADF will generate a MERGE statement that updates existing rows and inserts new ones.
- 3Step 3 — For dbt incremental models: change the config block to strategy: merge and specify unique_key. Example: {{ config(materialized='incremental', unique_key='order_id', incremental_strategy='merge') }}. Run dbt run --full-refresh once to rebuild the table cleanly, then future runs will merge.
- 4Step 4 — For custom T-SQL pipelines, replace bare INSERT with an idempotent pattern: MERGE target_table AS tgt USING source_cte AS src ON tgt.id = src.id WHEN MATCHED THEN UPDATE SET tgt.col1 = src.col1, tgt.col2 = src.col2 WHEN NOT MATCHED THEN INSERT (id, col1, col2) VALUES (src.id, src.col1, src.col2); This is atomic and retry-safe.
- 5Step 5 — If INSERT only (no updates needed): use INSERT ... WHERE NOT EXISTS: INSERT INTO target (id, col1, col2) SELECT id, col1, col2 FROM source WHERE NOT EXISTS (SELECT 1 FROM target t WHERE t.id = source.id); This skips rows that already exist instead of failing.
- 6Step 6 — Add a watermark or checkpoint to delta pipelines to avoid re-processing already-loaded data. Store the last successful high-watermark (e.g. max updated_at) in a control table and filter source rows: WHERE updated_at > (SELECT last_watermark FROM pipeline_control WHERE pipeline_name = 'my_pipeline').
- 7Step 7 — If the pipeline is retrying after a partial load, check whether the target table needs to be cleaned before the retry. DELETE FROM target WHERE batch_id = '<failed_batch_id>'; then re-run. For full-load pipelines, switch to a TRUNCATE + INSERT pattern inside a transaction so partial states never persist.
Example log output
ErrorCode=UserErrorSqlException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=SQL operation failed with the following error: 'Violation of PRIMARY KEY constraint 'PK_orders'. Cannot insert duplicate key in object 'dbo.orders'. The duplicate key value is (10043).',Source=Microsoft.DataTransfer.ClientLibrary