AI Recommendation Engine | Marketplace Personalization Case Study

Client overview

Industry focus: Marketplace
Portfolio segment: SaaS / Enterprise
Organization profile: Regional C2C marketplace, 18M listings, diversified categories

Static category boosts favored incumbents; sellers complained discovery buried new inventory. Regulatory scrutiny on algorithmic transparency increased after competitor fines abroad. Product leadership wanted measurable incremental GMV without black-box ML backlash.

Problem

Heuristic ranking capped GMV and obscured fairness; offline models drifted quickly vs. seasonal inventory.

Cold-start listings rarely surfaced despite quality signals available from seller history on other platforms when linked.

Offline batch recommendations conflicted with real-time inventory locks, frustrating buyers.

Data science lacked experimentation plumbing; engineers shipped "shadow" weights manually via config flags.

Solution

Two-stage retrieve-then-rank pipeline with embeddings, contextual bandits for exploration, fairness constraints, real-time inventory filters, and automated offline/online evaluation harness.

Candidate generation blended collaborative filtering residuals with transformer-lite embeddings trained on anonymized interaction sequences; ANN index on managed vector DB with replication per region.

Bandit layer optimized exploration budget per session class; constraints penalized disproportionate demographic skew detected via proxy auditing buckets agreed with legal.

Serving path hit Redis-backed feature store with staleness SLAs; fallback to deterministic ranking if drift detectors fired.

Implementation

1
Measurement discipline
Defined incremental metrics with holdout geography design; economist reviewed assumptions on seasonality. Logging schema captured propensity weights for audit.
2
Safe gradual rollout
Shadow mode replayed decisions vs. baseline; progressive traffic ramps with kill switch tied to GMV guardrails.
3
Seller communication
Transparency center explained non-personal signals used; appealed ranking disputes routed with ticket IDs correlated to ranking snapshots.

Tools & platforms

PyTorch
Ray
FAISS-compatible ANN service
MLflow
Feast feature store

Engineering challenges addressed

Latency budget vs. embedding dimensionality — distilled student models closed gap.
Fairness definitions negotiation across legal/product — documented trade-offs explicitly.

Program artifacts & environments

Abstract neural network visualization on screen — Embedding spaces were monitored for drift weekly.

People browsing retail marketplace on tablets — Human evaluation panels complemented automated uplift metrics.

Tech stack

Python
PyTorch
Ray
Redis
Kafka
Kubernetes
AWS
Snowflake
MLflow

Results

+14.3% incremental GMV in masked holdout regions vs. control
Cold-start listings median impressions +37% first week post-publish
Seller fairness complaints down 41% vs. prior heuristic era baseline

Quantified impact

+14.3% incremental GMV
Causal estimate with geographic holdouts + difference-in-differences robustness.
p95 ranking latency 86ms
Including ANN + bandit scoring at peak traffic.

Key takeaways

Recommendation ROI requires causal measurement — offline accuracy alone misleads executives.
Fairness tooling must be productized, not slide-deck promises after PR crises.
Exploration budgets should be financially capped — unconstrained bandits burn trust quickly.