Client overview
- Industry focus
- Retail media technology
- Portfolio segment
- SaaS / Enterprise
- Organization profile
- Retail media SaaS division inside big-box retailer tech org, ~220 engineers
Monolith deployments required full-stack freeze weekends; marketing feature teams missed seasonal campaign windows. Data ownership arguments paralyzed roadmap: was campaign entity owned by promotions, pricing, or content? Observability lagged — on-call chased ghosts across opaque shared libraries.
Problem
Monolithic release coupling and unclear boundaries slowed innovation and amplified outage blast radius.
Shared database tables meant schema migrations serialized across squads; hot rows from reporting queries starved transactional paths during holiday peaks.
Retry storms from synchronous HTTP chains masqueraded as downstream vendor incidents. Cache invalidation bugs propagated silently across modules.
Executive impatience tempted a risky "rewrite" — historically failed twice in adjacent divisions.
Solution
Strangler ingress routing traffic to extracted services behind stable APIs; Kafka as integration backbone; database ownership per service with CDC where replication needed; SLO-based rollout gates.
Domain mapping workshops produced canonical events (CampaignPublished, InventoryReserved) with JSON Schema contracts and compatibility CI. Edge gateway terminated TLS and enforced authZ centrally.
Services adopted hexagonal boundaries; legacy DB access wrapped in repositories slated for strangulation. Outbox pattern ensured reliable publishing without dual-write anomalies.
Progressive extract of read models into CQRS projections reduced reporting contention without big-bang warehouse migration initially.
Implementation
- 1
Boundary carving & ROI sequencing
Prioritized modules by change frequency × incident pain; catalog service extracted first due to omnichannel reuse.
- 2
Traffic migration waves
Shadow traffic compared responses; progressive percentage cutovers with automated rollback when divergence detected.
- 3
Operational maturity
Service templates baked tracing; synthetic journeys validated after each extraction; dependency maps visualized blast radius.
Tools & platforms
- Kafka
- Debezium CDC
- Envoy
- Terraform
- OpenTelemetry
- Backstage service catalog
Engineering challenges addressed
- Teaching teams distributed transaction skepticism — compensated with sagas and idempotency discipline.
- Keeping shared UI packages compatible during parallel version drift.
Program artifacts & environments


Tech stack
- Java
- Spring Boot
- Kafka
- PostgreSQL
- Kubernetes
- Envoy
- Terraform
- AWS
- OpenTelemetry
Results
- Lead time for changes −38% median post-extraction cohort vs. baseline
- Major incident count per deployment down 54% quarter rolling
- Peak-weekend freeze events eliminated after wave 3 cutover
Quantified impact
38% faster lead time
DORA metrics sourced from deployment API + issue linkage.
54% fewer incidents per deploy
Normalized by deploy count increase.
Key takeaways
- Microservices are an organizational refactoring — architecture diagrams alone do not reduce coupling.
- Event contracts deserve product management — versioning neglect creates distributed monolith regret.
- Measure extraction ROI in lead time and incident dollars — not microservice count vanity.
