Client overview
- Industry focus
- Marketplace
- Portfolio segment
- SaaS / Enterprise
- Organization profile
- Regional C2C marketplace, 18M listings, diversified categories
Static category boosts favored incumbents; sellers complained discovery buried new inventory. Regulatory scrutiny on algorithmic transparency increased after competitor fines abroad. Product leadership wanted measurable incremental GMV without black-box ML backlash.
Problem
Heuristic ranking capped GMV and obscured fairness; offline models drifted quickly vs. seasonal inventory.
Cold-start listings rarely surfaced despite quality signals available from seller history on other platforms when linked.
Offline batch recommendations conflicted with real-time inventory locks, frustrating buyers.
Data science lacked experimentation plumbing; engineers shipped "shadow" weights manually via config flags.
Solution
Two-stage retrieve-then-rank pipeline with embeddings, contextual bandits for exploration, fairness constraints, real-time inventory filters, and automated offline/online evaluation harness.
Candidate generation blended collaborative filtering residuals with transformer-lite embeddings trained on anonymized interaction sequences; ANN index on managed vector DB with replication per region.
Bandit layer optimized exploration budget per session class; constraints penalized disproportionate demographic skew detected via proxy auditing buckets agreed with legal.
Serving path hit Redis-backed feature store with staleness SLAs; fallback to deterministic ranking if drift detectors fired.
Implementation
- 1
Measurement discipline
Defined incremental metrics with holdout geography design; economist reviewed assumptions on seasonality. Logging schema captured propensity weights for audit.
- 2
Safe gradual rollout
Shadow mode replayed decisions vs. baseline; progressive traffic ramps with kill switch tied to GMV guardrails.
- 3
Seller communication
Transparency center explained non-personal signals used; appealed ranking disputes routed with ticket IDs correlated to ranking snapshots.
Tools & platforms
- PyTorch
- Ray
- FAISS-compatible ANN service
- MLflow
- Feast feature store
Engineering challenges addressed
- Latency budget vs. embedding dimensionality — distilled student models closed gap.
- Fairness definitions negotiation across legal/product — documented trade-offs explicitly.
Program artifacts & environments


Tech stack
- Python
- PyTorch
- Ray
- Redis
- Kafka
- Kubernetes
- AWS
- Snowflake
- MLflow
Results
- +14.3% incremental GMV in masked holdout regions vs. control
- Cold-start listings median impressions +37% first week post-publish
- Seller fairness complaints down 41% vs. prior heuristic era baseline
Quantified impact
+14.3% incremental GMV
Causal estimate with geographic holdouts + difference-in-differences robustness.
p95 ranking latency 86ms
Including ANN + bandit scoring at peak traffic.
Key takeaways
- Recommendation ROI requires causal measurement — offline accuracy alone misleads executives.
- Fairness tooling must be productized, not slide-deck promises after PR crises.
- Exploration budgets should be financially capped — unconstrained bandits burn trust quickly.
