Runbook: Migration Recovery
Atlas migration state diverged from expectation — typically during promotion to staging/production, or after env-gated data migrations.
Gotcha 1: atlas migrate set <V> is INCLUSIVE
atlas migrate set <version> marks all versions up to and including <version> as applied — without running them. If real, unapplied migrations exist below <version>, they are silently skipped (this bit production during the PR #1789 promote).
Correct sequence when marking a version:
atlas migrate apply --to-version <PREV> # actually apply everything below first
atlas migrate set <V> # then mark only the gated oneGotcha 2: env-gated migrations (renewa:atlas:skip-on)
Destructive/data migrations can be gated per environment with an in-file directive (PR #1797):
-- renewa:atlas:skip-on staging,productionatlas-migrate.sh reads the directive and uses atlas migrate set to skip on the named envs (APP_ENV). When debugging “migration ran locally but not in staging” — check for this directive before suspecting the pipeline.
Gotcha 3: atlas.sum integrity
atlas.sum is an append-only Merkle chain — never re-hash from scratch, never hand-edit. If your migration’s timestamp sorts before one already on main:
- Rename your migration to a current timestamp (regenerate via
make db-generate, don’t hand-edit). - Reset
atlas.sumto main’s version. atlas migrate hashto append only your entry.
Never fabricate migration artifacts (SQL, journal entries, timestamps, sums) by hand — always via drizzle-kit / atlas CLIs. Round timestamps in history are the fingerprint of past fabrication.
Gotcha 4: N-1 compatibility gate
Renames/drops/SET NOT NULL are gated by scripts/check-migration-n1.sh (PR #1938): renames ship a compat view plus a renewa:n-1-shim: <old> drop-with #<issue> annotation. The gate fails PRs whose live shim’s tracking issue was closed — reopen the issue or drop the shim in the same PR.
Diagnosis quick path
atlas migrate status --url "$DATABASE_URL_MIGRATION" # what does the cluster think is applied?
atlas migrate validate # is the local dir consistent with atlas.sum?Compare against atlas/migrations/ on main. If status reports a dirty/partial version: fix the cause (often Database Readonly Incident), then resolve the revision before re-applying.
Related
- Database Migrations — the normal workflow
- Deployment Rollback — when the schema change itself must be reverted (Path B)