Medium severitydata format
PostgreSQL Error:
22001
What does this error mean?
SQLSTATE 22001 fires when an INSERT or UPDATE tries to store a string value that exceeds the defined length of the target column (e.g., pushing 200 characters into a VARCHAR(50)). In data pipelines, this typically surfaces during bulk loads from staging tables, CSV imports via COPY, or when an ETL tool like dbt or ADF maps a source column without length validation. The immediate symptom is a failed statement — PostgreSQL does not silently truncate by default, it raises an exception and rolls back the transaction. Engineers see this as a job failure in their orchestrator, often with the message 'ERROR: value too long for type character varying(N)'.
Common causes
- 1Source data contains values longer than the column definition allows — common when ingesting from APIs or NoSQL sources where string length is unbounded (e.g., a user comment field with no max length upstream).
- 2Schema migration reduced a column's length (ALTER COLUMN ... TYPE VARCHAR(N)) but existing or in-flight data still carries the original longer values.
- 3ETL pipeline concatenates multiple fields into one target column without recalculating the combined length — e.g., first_name || ' ' || last_name exceeding VARCHAR(100).
- 4Character encoding mismatch: a UTF-8 multi-byte string that is N characters but M > N bytes when the column uses byte-based length semantics in certain client libraries.
- 5Upstream schema drift — a source system added a new enum value or extended a field, and the downstream PostgreSQL schema was never updated to match.
- 6Default values or fallback strings in error-handling code (e.g., a long error message stored in a status_reason column) exceed the column width.
How to fix it
- 1Step 1: Identify the offending column and its limit: `\d+ schema_name.table_name` in psql, or query `SELECT column_name, character_maximum_length FROM information_schema.columns WHERE table_name = 'your_table' AND data_type = 'character varying';`
- 2Step 2: Find the actual max length in your source data: `SELECT MAX(LENGTH(column_name)) FROM staging_table;` — this tells you how much headroom you need.
- 3Step 3: If widening the column is acceptable, alter it: `ALTER TABLE schema_name.table_name ALTER COLUMN column_name TYPE VARCHAR(500);` — this takes an ACCESS EXCLUSIVE lock on small tables but may need off-hours for large ones.
- 4Step 4: If you cannot widen the column, add a truncation step in your transformation: `LEFT(source_column, 50)` in SQL or `SUBSTRING(value, 1, 50)` in dbt models.
- 5Step 5: For dbt pipelines, add a test to catch drift early: `- dbt_utils.accepted_length: max_length: 50` on the column in your schema.yml.
- 6Step 6: In ADF, add a Derived Column transformation in your Data Flow that applies `left(column, 50)` before the sink write.
- 7Step 7: Audit source systems for recent schema changes: `SELECT * FROM pg_stat_activity WHERE query LIKE '%ALTER%' AND state = 'idle';` won't catch past DDL, but checking migration logs or source API changelogs will reveal when the length changed.
Example log output
ERROR: value too long for type character varying(50)
DETAIL: String of 137 characters is too long for type character varying(50).
CONTEXT: COPY customers, line 4892, column company_name: "Acme International Holdings & Subsidiaries Corp..."