Digital Media

Behavioural Recommendation Engine

Recommendations could not improve until consumption was measured to the second, aligned with content understanding, and re weighted per user on a weekly cycle the audience actually follows.

Timeline: Phased delivery: ingestion and semantic baseline first, weekly job and serving weights second, iterative model and guardrail tuning against live holdout metrics.
Scope: End to end design and implementation of second level ingestion, content understanding and vector pipeline, weekly per user weight computation, Redis queuing, S3 data lake patterns, MongoDB profile store, and integration with the existing delivery and ranking stack.
Model: Joint squads across data, backend, and product; clear ownership of event contracts, weekly job SLAs, and rollback of weight versions; monitoring on queue depth, job failure rates, and ranking drift week over week.

The outcome

A production pipeline from second level telemetry to content embeddings, behavioural vectors, and weekly per user weight updates.

Findings

What we built it around.

Second level consumption pipeline

playback position, dwell, skip, and completion events normalised into a time series model per user per asset

Content understanding layer

unified semantic representation per title/episode (structured tags, text derived features, and encoded vectors) so seconds listened map to topics and tone, not only IDs

Engagement × semantics join

align each consumption segment to the content state at that timestamp for training and online feature assembly

Behavioural embedding and similarity

vector encodings of user trajectories and catalogue items to surface neighbours in behaviour and in content space

Weekly per user weight job

batch recompute of ranking weights and preferences from the prior window, published to serving config, tuned to weekly discovery patterns rather than daily noise

Redis

queues and coordination for ingestion workers, feature backfills, and weekly recomputation; low latency caches where needed for serving hot keys

S3

durable store for raw events, parquet style aggregates, model artefacts, and reproducibility of weekly training snapshots

MongoDB

document model for user profiles, per week weight versions, cold start fallbacks, and catalogue adjacent metadata that evolves without rigid migrations

Serving integration

candidate retrieval and re ranking using updated weekly weights with guardrails for freshness, diversity, and business rules

Results

What changed.

Recommendations grounded in second level truth: ranking could distinguish shallow starts from deep listening on the same title

Semantic matching improved relevance for long tail catalogue items where sparse co occurrence had previously failed

Weekly refresh cycle aligned with real usage, fewer spurious daily shifts and more stable personalised surfaces for returning listeners

Operational clarity: ingestion, training, and serving decoupled via queues and object storage, with MongoDB supporting iterative schema evolution on profile and weight documents

Vector based behavioural similarity enabled explainable cohort style diagnostics (which behaviour clusters drive which recommendations) without exposing individual identities in product surfaces

Takeaway

A production pipeline from second level telemetry to content embeddings, behavioural vectors, and weekly per user weight updates.

Digital Media

Related services

Custom AI implementation

Start a discovery

Most engagements begin with a conversation about context.

We do not send a proposal before we understand the problem. Start by telling us about your decision context. We will identify the highest leverage intervention areas before any scope is agreed.

Start a Discovery Book an Intro Call