Ongoing

CookiNUM: Real-time streaming platform

Freelance Full-Stack Engineer · 2026 · 12 months, ongoing · 1 person · 3 min read

WebRTC platform for live culinary teaching, deployed across 20+ public institutions on locked-down university networks.

View project page

Context

CookiNUM is a UPEC research-action program for inclusive culinary training. A teacher uses a GoPro and a Python desktop app on-site. Remote classrooms watch the live stream and talk back, all running through a self-hosted server. I built the desktop app, the web client, the backend and the deployment as the only engineer on the project, working with the pedagogy and research teams.

The teaching loop only works if the remote class can ask questions back in real time. HLS adds 2–6s of latency and is one-way, which breaks the loop. On top of that, most target institutions sit behind academic firewalls that block outbound UDP and only let :443/tcp through, which breaks default WebRTC configurations.

Constraints

  • Sub-second media latency, both directions, on consumer networks
  • Most institutional networks only let :443/tcp out
  • Inclusive UX: learners include people with disabilities and adults far from the job market. The client has to work with keyboard, voice and minimal onboarding, not just mouse precision
  • Production runs on university-owned infrastructure: DSI hardening required (rootless containers, capability drops, read-only filesystems, network isolation, pinned image digests)

Approach

Self-hosted LiveKit SFU on a server, with all real-time traffic tunneled through TURN/TLS on :443 via Traefik. FastAPI as the single source of truth for sessions, tokens and voice floor control. PySide6 desktop app for the teacher (GoPro + offline French voice commands via Vosk). React/TS client for learners. The whole stack is Docker Compose, deployed by a GitHub Action.

Key decisions

WebRTC (LiveKit SFU) instead of HLS for media

First design was GoPro → RTMP → MediaMTX → HLS. HLS adds 2–6s of latency and is one-way, which blocks teacher↔learner voice. LiveKit's SFU brings end-to-end latency under a second and natively handles two-way audio in the same room, so the media transport and the voice channel collapse into a single piece of infra to operate.

Alternatives considered
  • Low-latency HLS plus a separate voice channel: two media systems to operate, weaker interactivity
  • LiveKit Cloud: fastest to ship, but per-minute billing and US data residency ruled it out for a public-sector rollout

Self-hosted LiveKit on one server, sized from real measurements

Operational target is 60 viewers per room (one extended class). On a 4 vCPU / 8 GB VPS, two empirical load tests (10 then 30 simulated subscribers via livekit-cli on a real teacher session) measured 15.7% and 27.3% CPU on the LiveKit container per core, with 0.017% and 0.28% packet loss client-side. Linear projection puts 60 viewers at ~55% of one core (~14% of the box) and <0.3% packet loss, giving a 7× margin against CPU saturation. No multi-node SFU, no Kubernetes.

Alternatives considered
  • LiveKit Cloud: same media quality and removes the dossier work, but blocked on cost and residency
  • Theoretical capacity sizing only: would not pass DSI review

Tech stack

  • Python 3.12, FastAPI, Pydantic, uv, Ruff, Pytest
  • React 19, TypeScript, Vite, livekit-client
  • PySide6, Vosk (offline French speech recognition)
  • LiveKit (self-hosted SFU, TURN/TLS on :443)
  • Docker Compose, Traefik v3, Let's Encrypt, UFW
  • GitHub Actions

Outcome

The project is currently being tested across multiple institutions. We are running active field iterations with pedagogy and IT teams (feedback loops, fixes, and deployment adjustments) before wider rollout.