2026-05-13 — Renovate bun bump bypassed CI, broke main
PR #1725 (Renovate: bun 1.3.13 → 1.3.14 in .github/workflows/quality-checks.yml) auto-merged with zero quality-check jobs actually executed. The next main-deploy run failed on backend-test (shard 1/3). Resolution required two PRs:
- #1727 — fix the CI gap so workflow-only PRs run their own quality checks
- #1728 — fix the latent flaky test that the bun bump surfaced
Timeline (UTC)
| Time | Event |
|---|---|
| 22:08 | Renovate opens #1725. detect-changes evaluates app-code=false (only .github/ changed). quality-checks, build, migration-validation, deploy-preview, e2e-tests, container-security-scan all skipped. |
| 22:08 | ci-gate job short-circuits via if [ "${NEEDS_APP_CI}" != "true" ]; then exit 0. Required check goes green. Renovate auto-merges. |
| 22:08 | main-deploy run 25829269951 starts on the merge commit. backend-test (shard 1/3) fails — quotes_quote_number_idx UNIQUE violation in the CAMT.053 test suite. Main is now red. |
| ~13:00+1d | Re-run of main-deploy shards goes green (test was flaky, not a real bun regression). Main is back. |
| 13:00+1d | #1727 + #1728 opened, reviewed, merged. |
Root cause 1 — CI gap (pr-preview.yml)
The detect-changes job in pr-preview.yml used a paths filter scoped to app code only:
filters:
app-code:
- 'renewa-one/backend/**'
- 'renewa-one/frontend/**'
- 'renewa-one/shared/**'
- 'renewa-one/docker-compose*'
- 'renewa-one/Dockerfile'
- 'renewa-one/nginx*'Every downstream job was gated on if: needs.detect-changes.outputs.needs-app-ci == 'true'. A change to .github/workflows/quality-checks.yml doesn’t trip app-code, so every gate was skipped. ci-gate then treated NEEDS_APP_CI != "true" as “nothing to check, exit 0” and the required check went green.
Chicken-and-egg: changes to the CI workflow itself never exercise the CI workflow. Renovate’s auto-merge consummated the gap.
Fix 1 (#1727)
Added a second paths filter ci-workflows matching only the workflows that define the CI pipeline — so changes to them re-trigger it:
ci-workflows:
- '.github/workflows/pr-preview.yml'
- '.github/workflows/quality-checks.yml'
- '.github/workflows/migration-tests.yml'
- '.github/workflows/docker-build.yml'
- '.github/workflows/e2e-tests.yml'
- '.github/workflows/container-security-scan.yml'
- '.github/workflows/security-scan.yml'Downstream gates became if: needs-app-ci == 'true' || needs-ci-workflows == 'true'. ci-gate’s “no app code, exit 0” shortcut is now guarded on both outputs being false. The fix is self-validating: PR #1727 modifies pr-preview.yml, so the new ci-workflows filter fires for itself.
Unrelated workflows (claude.yml, cleanup-*, staging-db-sync.yml, main-deploy.yml, promote-image.yml, security-scan-scheduled.yml, etc.) are deliberately not in the filter — changes to those shouldn’t synthesize fake PR pipelines.
Root cause 2 — flaky Date.now() IDs in billing-camt-import.test.ts
createBillingPrereqs() minted unique IDs as:
const ts = Date.now() + Math.floor(Math.random() * 1000);
// ...
quoteNumber: `QCAMT-${ts}`,The randomness window is only ~1 s. beforeEach cleaned billingInvoices and friends but not quotes, projects, buildings, contacts — so prereq rows piled up across the suite’s four tests. quotes.quote_number has a UNIQUE index. The collision was probabilistic; bun 1.3.14 shifted test timing enough to make it actually fire.
Fix 2 (#1728)
Replaced Date.now()-derived suffixes with randomUUID()-based suffixes for format-unconstrained fields (quote numbers, contact emails, file names, etc.). Two false starts along the way are worth retaining as lessons:
False start A — cascading delete of contacts in beforeEach
First commit added db.delete(contacts) (plus projects, buildings, quotes) to stop prereq accumulation. users.contact_id references contacts.id with ON DELETE CASCADE — so wiping contacts also wiped the test admin and every active session in the shared test DB. Symptoms: exactMatches: 0 (auth context lost mid-test) and a noisy Failed to recompute building account log.
Lesson: with randomUUID() IDs, accumulated prereq rows are inert — they can’t collide on UNIQUE columns with this test’s freshly-generated values. No cleanup needed. The simplest possible fix is also the safest, and the original beforeEach (billing tables only) was already correct.
False start B — invoice number regex coupling
Second commit kept randomUUID() for invoice numbers too: RE-EB-2026-${uniqueSuffix()} → RE-EB-2026-abc123def456. The CAMT matcher’s Pass 2 uses a strict regex:
// payment-matcher.ts:18
const INVOICE_NUMBER_PATTERN = /RE-(?:[A-Z]{2}-\d{4}-\d{5}|\d{4}-\d{4,5})/g;A 12-char hex suffix doesn’t match \d{5}. The matcher couldn’t find the number in the Verwendungszweck, demoted the entry to fuzzy, and AC-4.2a asserted exactMatches: 1 against a received 0.
Lesson: test data format is coupled to production parsers. When a production component imposes a format constraint (regex, schema, prefix), test fixtures must satisfy that constraint or the test exercises a different code path than intended. The final fix split helpers:
function uniqueSuffix(): string { // unconstrained
return randomUUID().replace(/-/g, '').slice(0, 12);
}
function invoiceNumberSuffix(): string { // must satisfy INVOICE_NUMBER_PATTERN
return String(Math.floor(Math.random() * 100000)).padStart(5, '0');
}100k uniqueness space is fine: beforeEach resets billingInvoices and each test creates ≤ 2 invoices.
Generalisable lessons
-
Filters that gate CI must include the files that define CI. Otherwise the system can never validate changes to itself, and any auto-merge mechanism (Renovate, Mergify) consummates the gap. Pattern to copy:
app-codeplus a smallci-workflowsfilter scoped to the workflows that define the pipeline — only those, not every YAML in.github/. -
ci-gateshortcuts are honey traps. “Required check passes when nothing ran” is the worst failure mode: it converts “no signal” into “green signal”. Any future early-exit shortcut in a required check needs to enumerate all axes of “nothing to check” — adding a new axis todetect-changesmeans updating the shortcut. -
onDelete: cascademakes table-truncating cleanup unsafe in shared test DBs.users.contact_id(Human FK Rule) cascades from contacts; sessions cascade from users. Wiping any “leaf” table can take out auth infrastructure. Prefer ID-scoped deletes, or generate unique-by-construction IDs (randomUUID) so accumulation is harmless. -
randomUUID()is the default for test IDs, notDate.now()and notMath.random()-without-padding. Reach for narrower spaces only when a production parser forces it (and document the coupling in a comment, as we now do forinvoiceNumberSuffix). -
Latent flakes surface under dependency changes. A test that passes on bun 1.3.13 and fails on 1.3.14 isn’t a bun regression — it’s a probabilistic bug whose distribution shifted. Treat the dep bump as the trigger, not the cause, and fix the test.
See also
- CI-CD Workflows — workflow inventory and
ci-gatesemantics - Git Workflow — branch protection rules + required status checks
- Claude Contributor Prompt — Human FK Rule and the
onDeleteintegrity defaults that made fix #2 tricky