Runbook: Deployment Rollback

A bad deploy reached development, staging, or production and must be reverted. Two paths — always try Path A first; Path B is break-glass for database damage.

Canonical tooling: renewa-one/scripts/rollback-deploy.sh (reworked in PR #1918, spec docs/superpowers/specs/2026-06-09-mpg-rollback-rework-design.md). Both paths live-tested 2026-06-10.

Path A — Image rollback (first resort, non-destructive)

Use when: the bad deploy is code-level (crash, broken feature, bad config) and the database is intact. Migrations are kept additive/backward-compatible (expand/contract), so the previous image runs safely against the current schema.

./renewa-one/scripts/rollback-deploy.sh <env> --dry-run   # verify resolved digest first
./renewa-one/scripts/rollback-deploy.sh <env>

What it does:

Resolves the previous image digest from the deployment-record GitHub issues.
Re-promotes it via gh workflow run promote-image.yml — the rollback stays on the CI deploy path (no local flyctl deploy).
Database untouched.

Notes:

flyctl releases rollback does not exist — do not attempt it.
Image digests join with @ (repo@sha256:…), never :.

Path B — Database break-glass (rare; script prepares, human flips)

Use when: a destructive migration or data corruption requires restoring the pre-deploy backup. Backup IDs are recorded in the deployment-record issue (created by snapshot-before-deploy.sh on every mutating deploy).

Critical mental model: flyctl mpg restore always forks a NEW cluster (new ID, new hostname). It can never restore in place. The CLUSTER_ID argument is the source whose backup is read.

The script:

Restores the pre-deploy backup into a new cluster and waits for readiness (~5 min).
Reads the current DB URLs from the running app via flyctl ssh console -C 'printenv DATABASE_URL' — no Infisical login needed mid-incident.
Verifies connectivity as BOTH app-user and migration-user through a flyctl mpg proxy tunnel (*.flympg.net hostnames resolve only on Fly’s private network — direct psql from a laptop can never reach them).
Prints the manual flip checklist and stops. No destructive step is automated.

Manual flip (human executes, in order):

Update DATABASE_URL + DATABASE_URL_MIGRATION in Infisical (project renewa-one, matching env scope) — never via flyctl secrets set (the Infisical sync would overwrite it).
Redeploy / wait for Auto Redeploy.
Update the env→cluster-id map in renewa-one/scripts/lib/mpg-clusters.sh (single source of truth).
Only after the new cluster is verified serving: detach + destroy the old cluster.

Facts that save time:

pgBackRest physical restore preserves roles and password hashes — existing app-user/migration-user credentials work on the forked cluster.
🚨 NEVER flyctl mpg destroy on production before the replacement is verified — zero recovery.

Verification

flyctl status -a <app> shows the expected image.
GET https://<app>/api/health/ready returns 200.
Deployment-record issue updated with what was rolled back and why.

Database Readonly Incident — if the trigger was a readonly database, fix the disk first
Migration Recovery — when the problem is migration state, not the image
Secret Management Operations — the Infisical flip in detail

RENEWA Knowledge Base

Explorer

Runbook: Deployment Rollback

Runbook: Deployment Rollback

Path A — Image rollback (first resort, non-destructive)

Path B — Database break-glass (rare; script prepares, human flips)

Verification

Graph View

Table of Contents

Backlinks

RENEWA Knowledge Base

Explorer

Runbook: Deployment Rollback

Runbook: Deployment Rollback

Path A — Image rollback (first resort, non-destructive)

Path B — Database break-glass (rare; script prepares, human flips)

Verification

Related

Graph View

Table of Contents

Backlinks