Runbook: Deployment Rollback
A bad deploy reached development, staging, or production and must be reverted. Two paths — always try Path A first; Path B is break-glass for database damage.
Canonical tooling: renewa-one/scripts/rollback-deploy.sh (reworked in PR #1918, spec docs/superpowers/specs/2026-06-09-mpg-rollback-rework-design.md). Both paths live-tested 2026-06-10.
Path A — Image rollback (first resort, non-destructive)
Use when: the bad deploy is code-level (crash, broken feature, bad config) and the database is intact. Migrations are kept additive/backward-compatible (expand/contract), so the previous image runs safely against the current schema.
./renewa-one/scripts/rollback-deploy.sh <env> --dry-run # verify resolved digest first
./renewa-one/scripts/rollback-deploy.sh <env>What it does:
- Resolves the previous image digest from the deployment-record GitHub issues.
- Re-promotes it via
gh workflow run promote-image.yml— the rollback stays on the CI deploy path (no localflyctl deploy). - Database untouched.
Notes:
flyctl releases rollbackdoes not exist — do not attempt it.- Image digests join with
@(repo@sha256:…), never:.
Path B — Database break-glass (rare; script prepares, human flips)
Use when: a destructive migration or data corruption requires restoring the pre-deploy backup. Backup IDs are recorded in the deployment-record issue (created by snapshot-before-deploy.sh on every mutating deploy).
Critical mental model: flyctl mpg restore always forks a NEW cluster (new ID, new hostname). It can never restore in place. The CLUSTER_ID argument is the source whose backup is read.
The script:
- Restores the pre-deploy backup into a new cluster and waits for readiness (~5 min).
- Reads the current DB URLs from the running app via
flyctl ssh console -C 'printenv DATABASE_URL'— no Infisical login needed mid-incident. - Verifies connectivity as BOTH
app-userandmigration-userthrough aflyctl mpg proxytunnel (*.flympg.nethostnames resolve only on Fly’s private network — direct psql from a laptop can never reach them). - Prints the manual flip checklist and stops. No destructive step is automated.
Manual flip (human executes, in order):
- Update
DATABASE_URL+DATABASE_URL_MIGRATIONin Infisical (projectrenewa-one, matching env scope) — never viaflyctl secrets set(the Infisical sync would overwrite it). - Redeploy / wait for Auto Redeploy.
- Update the env→cluster-id map in
renewa-one/scripts/lib/mpg-clusters.sh(single source of truth). - Only after the new cluster is verified serving: detach + destroy the old cluster.
Facts that save time:
- pgBackRest physical restore preserves roles and password hashes — existing
app-user/migration-usercredentials work on the forked cluster. - 🚨 NEVER
flyctl mpg destroyon production before the replacement is verified — zero recovery.
Verification
flyctl status -a <app>shows the expected image.GET https://<app>/api/health/readyreturns 200.- Deployment-record issue updated with what was rolled back and why.
Related
- Database Readonly Incident — if the trigger was a readonly database, fix the disk first
- Migration Recovery — when the problem is migration state, not the image
- Secret Management Operations — the Infisical flip in detail