SaaS / Enterprise

Retail media technology

Microservices Migration Project

Retail media monolith split into bounded services behind an event backbone — strangler routing, incremental data ownership, and operational guardrails prevented the classic distributed big-bang failure mode.

Client overview

Industry focus
Retail media technology
Portfolio segment
SaaS / Enterprise
Organization profile
Retail media SaaS division inside big-box retailer tech org, ~220 engineers

Monolith deployments required full-stack freeze weekends; marketing feature teams missed seasonal campaign windows. Data ownership arguments paralyzed roadmap: was campaign entity owned by promotions, pricing, or content? Observability lagged — on-call chased ghosts across opaque shared libraries.

Problem

Monolithic release coupling and unclear boundaries slowed innovation and amplified outage blast radius.

Shared database tables meant schema migrations serialized across squads; hot rows from reporting queries starved transactional paths during holiday peaks.

Retry storms from synchronous HTTP chains masqueraded as downstream vendor incidents. Cache invalidation bugs propagated silently across modules.

Executive impatience tempted a risky "rewrite" — historically failed twice in adjacent divisions.

Solution

Strangler ingress routing traffic to extracted services behind stable APIs; Kafka as integration backbone; database ownership per service with CDC where replication needed; SLO-based rollout gates.

Domain mapping workshops produced canonical events (CampaignPublished, InventoryReserved) with JSON Schema contracts and compatibility CI. Edge gateway terminated TLS and enforced authZ centrally.

Services adopted hexagonal boundaries; legacy DB access wrapped in repositories slated for strangulation. Outbox pattern ensured reliable publishing without dual-write anomalies.

Progressive extract of read models into CQRS projections reduced reporting contention without big-bang warehouse migration initially.

Implementation

  1. 1

    Boundary carving & ROI sequencing

    Prioritized modules by change frequency × incident pain; catalog service extracted first due to omnichannel reuse.

  2. 2

    Traffic migration waves

    Shadow traffic compared responses; progressive percentage cutovers with automated rollback when divergence detected.

  3. 3

    Operational maturity

    Service templates baked tracing; synthetic journeys validated after each extraction; dependency maps visualized blast radius.

Tools & platforms

  • Kafka
  • Debezium CDC
  • Envoy
  • Terraform
  • OpenTelemetry
  • Backstage service catalog

Engineering challenges addressed

  • Teaching teams distributed transaction skepticism — compensated with sagas and idempotency discipline.
  • Keeping shared UI packages compatible during parallel version drift.

Tech stack

  • Java
  • Spring Boot
  • Kafka
  • PostgreSQL
  • Kubernetes
  • Envoy
  • Terraform
  • AWS
  • OpenTelemetry

Results

  • Lead time for changes −38% median post-extraction cohort vs. baseline
  • Major incident count per deployment down 54% quarter rolling
  • Peak-weekend freeze events eliminated after wave 3 cutover

Quantified impact

  • 38% faster lead time

    DORA metrics sourced from deployment API + issue linkage.

  • 54% fewer incidents per deploy

    Normalized by deploy count increase.

Key takeaways

  • Microservices are an organizational refactoring — architecture diagrams alone do not reduce coupling.
  • Event contracts deserve product management — versioning neglect creates distributed monolith regret.
  • Measure extraction ROI in lead time and incident dollars — not microservice count vanity.

Book a free consultation — we respond within one business day.

Start