EdTech

EdTech

EdTech Learning Platform Scalability

Global learning platform flattened traffic spikes during live classes — autoscaling media path, CDN strategy, and data tier sharding preserved sub-second interactions for 3M+ MAU.

Client overview

Industry focus
EdTech
Portfolio segment
EdTech
Organization profile
Public benefit corporation, ~250 staff, learners in 90+ countries

Product-market fit drove MAU past infrastructure assumptions; Sunday exam windows saturated API clusters. Educators demanded proctored assessments with integrity signals; finance needed predictable unit economics before the next curriculum partnership with a national ministry.

Problem

Live class and exam bursts overwhelmed monolithic APIs; caching inconsistencies corrupted assessment submissions.

Edge POPs lacked tuned WebRTC TURN allocations; learners in LATAM experienced audio drift during breakout rooms. Redis single-node caches caused thundering herds when hero teacher sessions began simultaneously worldwide.

Assessment service stored attempts in-row with course content tables, locking hot partitions. Students occasionally saw stale progress bars leading to duplicate retries.

Vendor CDN bills spiked nonlinearly because adaptive bitrate ladders were untuned for mobile-first learners on 3G-equivalent links.

Solution

Decomposed hot paths: stateless signaling, sharded Redis, read-scaled Postgres, hierarchical CDN caching with signed segment URLs, and integrity streaming to proctoring subsystem.

We split read and write paths: command bus for attempts, CQRS projections for dashboards, and outbox pattern to analytics. Live class control plane moved to dedicated autoscaling groups with predictive scaling informed by timetable imports.

Redis Cluster with replica reads for session + rate limits; fairness policies avoided teacher-induced hotspots. Postgres added partitioning by tenant with citus-style coordinator for heavy tenants.

CDN optimization tuned ladder variants per geography; prefetch hints aligned with LMS navigation graphs.

Implementation

  1. 1

    Traffic characterization

    Captured WebRTC telemetry, HTTP waterfall baselines, and Redis slow logs during peak Sundays. Chaos injected regional latency to validate adaptive UX messaging.

  2. 2

    Incremental extraction

    Strangled assessment APIs behind BFF with feature flags; replay tests compared scoring outputs against legacy monolith on millions of archived attempts.

  3. 3

    Cost-performance tradeoffs

    Committed use discounts aligned to predictable exam windows; lifecycle policies moved cold recordings to cheaper storage classes.

Tools & platforms

  • Kubernetes HPA/VPA
  • CloudFront + Lambda@Edge
  • Redis Cluster
  • pg_partman
  • k6 load tests

Engineering challenges addressed

  • Maintaining assessment integrity while scaling out writers without serializing globally.
  • Teacher trust during migration — transparent status pages and rollback drills.

Tech stack

  • Next.js
  • Node.js
  • PostgreSQL
  • Redis
  • Kubernetes
  • AWS
  • WebRTC
  • Kafka
  • OpenTelemetry

Results

  • Sustained 40k concurrent learners with <1.5s p95 interaction latency
  • 61% lower infra cost per MAU after 6 months of tuning
  • Assessment integrity incidents dropped 92% vs. prior term

Quantified impact

  • 40k concurrent learners validated in load + production peaks

    Observed during national exam collaboration.

  • 78% reduction in CDN overspend

    Via ladder tuning and segment TTL discipline.

Key takeaways

  • EdTech peaks are calendar-driven — invest in predictive autoscaling tied to academic calendars, not just CPU metrics.
  • CQRS pays off when reads dwarf writes during revision windows.
  • Integrity features must be designed with observability — not bolted after cheating incidents appear on Twitter.

Book a free consultation — we respond within one business day.

Start