Low severityconnectivity
PostgreSQL Error:
08003
What does this error mean?
SQLSTATE 08003 means the connection handle your application or driver references has already been closed on the server side — the TCP session is gone before the query was sent. In a data pipeline this typically surfaces mid-transaction: an ADF copy activity, dbt model run, or SQLAlchemy session tries to execute a statement, the driver hands it a pooled socket that the server already killed, and PostgreSQL immediately returns this error before any data is read or written. The symptom is an abrupt failure with no partial result, often wrapped in a higher-level message like 'connection closed unexpectedly' from your orchestration layer.
Common causes
- 1**idle_in_transaction_session_timeout fired mid-pipeline.** PostgreSQL closes sessions that sit inside an open transaction without activity for longer than the configured threshold. Long-running dbt models or ADF pipelines that open a transaction, pause for upstream processing, then resume will hit this when the pause exceeds the timeout — typically 30 s–5 min in managed services like Azure Database for PostgreSQL Flexible Server.
- 2**Connection pool returned a stale socket after a server restart or failover.** When the database server restarts (patch, failover, maintenance window), existing TCP connections are torn down. The pool does not know this until it tries to use one. The next query that gets that dead socket receives 08003 immediately.
- 3**Application held a connection open across a slow external call.** A pipeline that opens a transaction, calls an external API or writes to blob storage mid-transaction, then comes back to write to PostgreSQL will find the connection gone if the external call took longer than the server timeout allows.
- 4**TCP keepalive not configured, connection silently dropped by a firewall or NAT gateway.** Cloud NAT gateways (Azure NAT Gateway, AWS NAT) silently drop idle TCP connections after 4–10 minutes. The OS and application think the socket is still open; the server has already discarded it. The error only appears when the driver next writes to the dead socket.
- 5**`pool_recycle` interval longer than `wait_timeout` / server-side timeout.** If the pool recycles connections every 3600 s but the server closes idle connections after 600 s, there is a 2400-second window where the pool holds connections that are already dead server-side.
- 6**pgBouncer or other connection pooler in transaction mode closing the backend connection mid-session.** In transaction-pooling mode, pgBouncer may assign a different backend connection between statements. If the pipeline assumes connection-level state (temp tables, SET LOCAL, prepared statements), the mismatch causes the next statement to land on a connection that does not have that state — or on one that is closing.
- 7**Concurrent `pg_terminate_backend()` call from an admin or monitoring script.** A DBA running `SELECT pg_terminate_backend(pid)` to clear long-running queries will terminate any active pipeline connection, which raises 08003 on the application side.
How to fix it
- 1Step 1: Enable pre-ping / connection validation in your pool. In SQLAlchemy: `create_engine(url, pool_pre_ping=True)`. In psycopg2 connection pools: issue a `SELECT 1` before handing out a connection. This adds ~0.3 ms overhead per checkout but eliminates 08003 from stale sockets.
- 2Step 2: Set `pool_recycle` shorter than the server's idle timeout. Check the server timeout: `SHOW idle_in_transaction_session_timeout;` and `SHOW tcp_keepalives_idle;`. Set `pool_recycle` to 60–80 % of the lower value. Example SQLAlchemy: `create_engine(url, pool_recycle=300)` when server timeout is 600 s.
- 3Step 3: Verify and tune server-side timeouts on Azure Database for PostgreSQL: in the Azure Portal → your server → Server parameters → search `idle_in_transaction_session_timeout`. For pipelines that need long transactions, raise it or set it to 0 (disabled) only for the pipeline role: `ALTER ROLE pipeline_user SET idle_in_transaction_session_timeout = 0;`
- 4Step 4: Configure OS-level TCP keepalives so the OS probes the connection before the NAT gateway drops it. In libpq connection string: `keepalives=1&keepalives_idle=60&keepalives_interval=10&keepalives_count=5`. This sends a keepalive probe after 60 s of inactivity, retrying every 10 s up to 5 times.
- 5Step 5: Identify currently open idle-in-transaction connections before they time out: `SELECT pid, usename, state, now() - state_change AS idle_duration, query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY idle_duration DESC;` Terminate the longest ones if they are from a hung pipeline: `SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND now() - state_change > interval '5 minutes';`
- 6Step 6: Add retry logic with exponential backoff around database calls in your pipeline. Catch `psycopg2.OperationalError` (or `sqlalchemy.exc.OperationalError`) where `pgcode == '08003'`, close and discard the connection, then re-acquire a fresh one from the pool and retry up to 3 times. Do not retry on the same connection object.
- 7Step 7: If using pgBouncer, switch the pipeline's database to session-pooling mode (`pool_mode = session` in pgbouncer.ini for that database entry) or ensure the pipeline does not rely on connection-level state across statements. Alternatively, connect the pipeline directly to PostgreSQL bypassing pgBouncer for long-running jobs.
Example log output
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: connection does not exist
SSLERROR: decryption failed or bad record mac
[SQL: SELECT id, updated_at FROM public.fact_orders WHERE updated_at > %(ts)s]
(Background on this error at: https://sqlalche.me/e/14/e3q8)ERROR 1 of 1 in model fact_orders (models/marts/fact_orders.sql)
Database Error in model fact_orders
connection does not exist
compiled Code at target/compiled/marts/fact_orders.sqlActivity failed: CopyData | ErrorCode: UserErrorOdbcOperationFailed | Message: ODBC Source, Error [08003] connection does not exist | Pipeline: pl_ingest_postgres | Duration: 00:02:14