Critical severitycapacity
PostgreSQL Error:
53100
What does this error mean?
SQLSTATE 53100 means PostgreSQL tried to write data — a row insert, WAL segment, temp file for a sort/hash, or VACUUM cleanup — and the OS returned ENOSPC (no space left on device). All write operations stop immediately: INSERTs, UPDATEs, COPY commands, and even autovacuum. In a data-pipeline context this typically surfaces as a failed dbt run, an ADF Copy Activity error, or an Airflow task hitting a psycopg2.errors.DiskFull exception. The database stays online for reads, but any transaction that needs to write — including the WAL flush on commit — will fail. On managed services like Azure Database for PostgreSQL or Amazon RDS, the error may auto-resolve if storage autogrow is enabled, but self-hosted instances require manual intervention.
Common causes
- 1Write-heavy ETL jobs (COPY, bulk INSERT, CREATE TABLE AS) filling the data directory or WAL directory faster than disk provisioning allows.
- 2WAL accumulation from an inactive replication slot — `pg_replication_slots` shows a slot with `active = false` while `pg_wal_lsn_diff()` keeps growing, preventing WAL cleanup.
- 3Log files growing unbounded because `log_rotation_age` and `log_rotation_size` are not set, or `logging_collector` writes to the same partition as PGDATA.
- 4Temporary files from large sorts or hash joins that exceed `work_mem` — PostgreSQL spills to disk in `pgsort*` and `pghash*` files inside `base/pgsql_tmp/`.
- 5Dead tuple bloat after disabling or delaying autovacuum — the table and its TOAST data keep growing, and VACUUM itself needs disk space to operate.
- 6Point-in-time recovery (PITR) or continuous archiving (`archive_command`) failing silently, causing WAL segments to accumulate in `pg_wal/` because PostgreSQL will not remove un-archived WAL.
How to fix it
- 1Step 1: Check available disk space and identify the largest consumers — run `df -h $(psql -tAc "SHOW data_directory")` and `du -sh /var/lib/postgresql/*/main/pg_wal` to see how much WAL is using.
- 2Step 2: Remove old WAL files held by inactive replication slots — identify them with `SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes FROM pg_replication_slots;` then drop unused slots with `SELECT pg_drop_replication_slot('slot_name');`.
- 3Step 3: Clean up old log files and temp files — `find /var/log/postgresql -name '*.log' -mtime +7 -delete` and check `base/pgsql_tmp/` for leftover spill files from crashed queries.
- 4Step 4: Reclaim dead-tuple space on the largest tables — `VACUUM (VERBOSE) schema.table_name;` (use VACUUM FULL only if you can afford an exclusive lock, as it rewrites the entire table and needs extra disk space to do so).
- 5Step 5: On Azure Database for PostgreSQL, enable storage autogrow: Azure Portal → Server → Settings → Compute + storage → toggle 'Storage auto-growth'. On RDS, modify the instance to increase `allocated_storage`.
- 6Step 6: Add a disk-usage alert — configure OS-level monitoring with a Prometheus `node_exporter` alert for `node_filesystem_avail_bytes < 5e9` on the PGDATA mount, or use CloudWatch/Azure Monitor on managed instances.
- 7Step 7: Move WAL to a separate mount to isolate it from data growth — set `wal_directory` in postgresql.conf (requires restart) or symlink `pg_wal` to a dedicated volume.
Example log output
2026-05-11 14:32:07.891 UTC [12345] ERROR: could not extend file "base/16384/24601": No space left on device
2026-05-11 14:32:07.891 UTC [12345] HINT: Check free disk space.
2026-05-11 14:32:07.892 UTC [12345] STATEMENT: COPY staging.events FROM STDIN WITH (FORMAT csv)