Runbook: Database Readonly Incident

Fly Managed Postgres silently switches a cluster to readonly when its volume passes ~90% disk usage. The failure masquerades as a connection problem, which sends you debugging the wrong layer.

Symptoms

Atlas / lib-pq during deploy or migration: sql/migrate: write revision: driver: bad connection
App writes failing while reads work
Postgres error code 25006 (read_only_sql_transaction) — but drivers often surface it as “bad connection”

Diagnosis — check this FIRST

flyctl checks list -a <db-app>
flyctl mpg status <cluster-id>

Look for failing disk/health checks before touching connection strings, PgBouncer, or credentials. A “bad connection” message during a write with working reads is readonly mode until proven otherwise.

Resolution

Extend the volume (Fly dashboard or flyctl) — readonly lifts automatically once usage drops below the threshold.
Re-run the failed migration/deploy.
Afterwards: investigate what consumed the disk (table bloat, WAL accumulation, log growth) so it does not recur.

Verification

flyctl checks list -a <db-app> all green.
A write succeeds (re-run the deploy’s release_command or a trivial UPDATE via flyctl mpg proxy).

Deployment Rollback — if a deploy half-applied during the incident
Migration Recovery — if the migration revision table is now inconsistent

RENEWA Knowledge Base

Explorer

Runbook: Database Readonly Incident

Runbook: Database Readonly Incident

Symptoms

Diagnosis — check this FIRST

Resolution

Verification

Graph View

Table of Contents

Backlinks

RENEWA Knowledge Base

Explorer

Runbook: Database Readonly Incident

Runbook: Database Readonly Incident

Symptoms

Diagnosis — check this FIRST

Resolution

Verification

Related

Graph View

Table of Contents

Backlinks