AI-Powered Search Relevance & Revenue Engine

$ cd ~/projects/search-relevance-platform agent.shipped · in production

Search That
Sells.

Keyword matching was leaving revenue on the table.
We swapped it for a Cohere Embed v4 retrieval + Cohere Rerank 3.5
+ ColBERTv2 late-interaction stack on Vespa with BBQ quantisation,
served via ONNX. 2,400 RPS, p99 under 45 ms, CTR up 50% and
$12M of annualised incremental revenue.

AI-powered search relevance & revenue engine

Industry: eCommerce
Timeline: 16 weeks
Key result: 50% CTR uplift, $12M revenue
Tech stack: Cohere Embed v4 + Rerank 3.5, ColBERTv2 / PLAID, Matryoshka short-vectors, Vespa with BBQ, ONNX, off-policy LTR (IPS / doubly robust), Python

We ran a semantic search stack at 2,400 RPS against 18M monthly searches. CTR climbed 50%, incremental revenue hit $12M annualised, p99 latency stayed under 45 ms and uptime held at 99.97%.

Cohere Embed v4 handles retrieval (Matryoshka short-vectors for fast pre-filter), ColBERTv2 / PLAID gives us late-interaction recall, Cohere Rerank 3.5 (distilled) does the final rerank, and the index lives in Vespa with BBQ binary quantisation. ONNX serves the encoders. We tuned the whole thing against CTR and revenue with off-policy LTR (IPS / doubly robust) so feedback loops don’t poison the ranker, not against an offline nDCG number nobody ships to.

AI Delivery Approach

Query understanding — We trained intent classifiers so the retriever understands “running shoes for flat feet” instead of three loose tokens, and added query rewriting via a small LLM behind a strict latency budget.
Learning-to-rank on real signals — The reranker is fine-tuned on clicks and conversions, but corrected with off-policy methods (IPS / doubly robust / counterfactual risk minimisation) so the ranker doesn’t learn its own positional bias. Relevance is defined by what customers actually buy.
Latency budget held — Vespa with BBQ binary quantisation + Matryoshka short-vector pre-filter + a distilled Cohere Rerank 3.5 (top-K = 50) serves 2,400 RPS with p99 under 45 ms. We batched where we could, quantised where it didn’t cost quality.
Every change goes through A/B — Ranking and retrieval changes ship through an A/B framework wired to revenue attribution, so the 50% CTR lift and 37% revenue lift are numbers we can defend.

What was actually hard

Semantic retrieval at 2,400 RPS with a p99 under 45 ms rules out a lot of off-the-shelf stacks. We had to get the index fast, the scoring fast, and the evaluation honest so every model update showed up as real CTR and revenue, not just a nicer offline chart.

Project Outcome

CTR went up 50% and incremental revenue came in at 37%, which the A/B framework held steady across weeks. Request rate stayed at 2,400 RPS and the ranking quality didn’t drift once traffic patterns changed after launch.

> 50% CTR
uplift > $12M annualized
revenue > <45ms p99
latency > 99.97% uptime
SLA

Business metrics and ranking trends on tablet

Product manager reviewing growth dashboard

Cohere Embed v4Cohere Rerank 3.5ColBERTv2 / PLAIDMatryoshka embeddingsVespaBBQ quantisationONNXOff-policy LTR (IPS / DR)PythonA/B Framework

“We needed an AI search layer that understood product intent, not just keywords. ImmovableTech shipped a semantic engine that lifted our CTR by 50% and drove real incremental revenue.”