Case Study
The Knot
Untangling a brittle legacy platform and restoring release confidence.
Context
The core transaction system had grown for a decade with inconsistent ownership and heavy coupling. The team had stopped trusting deployments, and roadmap work was repeatedly delayed by stability incidents.
Constraints
- No prolonged feature freeze; commercial commitments remained active.
- Limited staffing for net-new platform work.
- Strict reliability requirements during peak customer periods.
Options Considered
- Full rewrite: high long-term upside, unacceptable near-term risk.
- Status quo patching: low immediate cost, guaranteed continued fragility.
- Phased extraction behind stable interfaces: balanced risk and momentum.
Chosen Approach
We selected phased extraction. I aligned engineering and leadership on a sequence: isolate high-volatility modules first, define interface contracts, migrate incrementally, and instrument each stage before further rollout.
Outcomes
- Rollback rate reduced from 17% to 3% over one quarter.
- Median incident resolution time improved by 38%.
- Team resumed weekly releases with predictable quality.
What I'd Do Differently
I would establish cross-team ownership boundaries earlier. We solved this later with clearer service charters, but introducing them at kickoff would have reduced coordination drag.