Case Study

The Knot

Untangling a brittle legacy platform and restoring release confidence.

Context

The core transaction system had grown for a decade with inconsistent ownership and heavy coupling. The team had stopped trusting deployments, and roadmap work was repeatedly delayed by stability incidents.

Constraints

  • No prolonged feature freeze; commercial commitments remained active.
  • Limited staffing for net-new platform work.
  • Strict reliability requirements during peak customer periods.

Options Considered

  1. Full rewrite: high long-term upside, unacceptable near-term risk.
  2. Status quo patching: low immediate cost, guaranteed continued fragility.
  3. Phased extraction behind stable interfaces: balanced risk and momentum.

Chosen Approach

We selected phased extraction. I aligned engineering and leadership on a sequence: isolate high-volatility modules first, define interface contracts, migrate incrementally, and instrument each stage before further rollout.

Outcomes

  • Rollback rate reduced from 17% to 3% over one quarter.
  • Median incident resolution time improved by 38%.
  • Team resumed weekly releases with predictable quality.

What I'd Do Differently

I would establish cross-team ownership boundaries earlier. We solved this later with clearer service charters, but introducing them at kickoff would have reduced coordination drag.