Every product wants recommendations. "Customers also bought." "Articles you might like." "More like this." It's one of the highest-leverage features you can ship, because every click on a recommendation is a session that would have ended otherwise.
The problem is that "recommendation engine" used to mean a six-month project. You'd hire a data scientist, set up a feature store, train a collaborative filtering model, build a serving layer, and pray your traffic patterns matched the assumptions in your training data.
That world is mostly over. You can build a recommendation engine that works in production with a handful of API calls and zero ML expertise. Not a toy version. A real one.
This post walks through how to do it, what the actual options are, and where each one breaks.
The Three Kinds of Recommendations
Before you build anything, get clear on which kind you actually need. The three categories solve different problems, and the wrong choice means recommendations that look fine in a demo and tank in production.
Content-based. Recommend items similar to the one a user is looking at. "More like this product." "Related articles." Doesn't need user history. Works on day one with zero traffic. The classic cold-start solution.
Collaborative. Recommend items based on what similar users liked. "Users who bought this also bought." Needs interaction data (clicks, purchases, ratings). Doesn't work until you have meaningful traffic.
Hybrid. Mix both. Use content similarity for new items and users, fall back to collaborative once you have data. This is what most real systems end up doing.
For most products shipping today, content-based is where you start. It works immediately, and the modern tooling makes it almost trivial.
What Content-Based Recommendations Actually Need
Pre-2023, building content-based recommendations meant TF-IDF vectors, cosine similarity over your product catalog, and a small army of edge cases (synonyms, multilingual, image features, the cold start of cold starts).
Today, the whole thing is one concept: embeddings.
An embedding is a list of numbers that represents the meaning of a piece of data. A product description, a blog post, an image. Two items with similar meanings produce similar embeddings. Find the items with embeddings closest to a target item, and you have your recommendations.
That's it. That's the algorithm.
The work is no longer the algorithm. The work is the infrastructure: generating embeddings, storing them, searching them quickly enough to serve a real product.
The Naive Approach (And Why It Falls Apart)
Here's the version most tutorials show you:
- Use OpenAI's embedding API to generate a vector for every item
- Store the vectors in a list, in memory
- On each request, compute cosine similarity against every vector
- Return the top N
This works for 1,000 items. It dies at 100,000. It dies harder at a million. Linear search over a million vectors per request is not a thing you ship.
The fix is approximate nearest neighbor (ANN) search, which lets you find the closest vectors in milliseconds even at scale. ANN is the entire reason vector databases exist.
So the real architecture is:
- Generate embeddings for every item
- Push them into something that does ANN search
- Query that thing for nearest neighbors
- Return the results
The question is: what do you use for steps 1 and 2?
Option 1: Build It Yourself
OpenAI for embeddings. Pinecone, Qdrant, or pgvector for storage and search. Your own glue code to keep them in sync.
This works. Plenty of teams do it. But it's three services, two API keys, and a sync layer you have to maintain. Every time a product changes, you re-embed and re-upsert. Every time a new product is added, same thing. Every time the embedding model is updated, you re-embed your entire catalog.
Plus you're paying for embeddings (per token), the vector database (per GB and per query), and the engineering time to glue it together. The "I'll just wire up a few APIs" estimate is usually three weeks of real work.
It's the right choice if you have an ML team and need full control over embedding models. It's the wrong choice for most product teams shipping a feature.
Option 2: Use a Search API
A search API hides all of this. You send in your data, you call a search endpoint, you get results back. Embeddings, indexing, sync, everything is handled.
For a recommendation engine, the workflow is:
- Insert your items (products, articles, whatever) into the API
- To get recommendations for a given item, search using that item's text or ID
- Return the closest matches, excluding the item itself
That's the whole implementation.
This is what Vecstore is built for. There's no embedding step. No second database. No sync code. You insert your records and call the search endpoint.
A typical "more like this" feature in Vecstore looks like this:
// On product page load, get 6 similar products
const response = await fetch(`https://api.vecstore.app/databases/${dbId}/search`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: currentProduct.title + ' ' + currentProduct.description,
limit: 7, // 7 because we'll filter out the current product
}),
});
const { results } = await response.json();
const recommendations = results
.filter(r => r.id !== currentProduct.id)
.slice(0, 6);
That's a content-based recommendation engine. It runs on day one, with no user data, and works the moment you have a catalog.
Beyond Text: Image and Multimodal Recommendations
A lot of recommendation use cases aren't text. Fashion, furniture, home decor, anything visual. Two products with nearly identical descriptions ("blue cotton shirt") can look completely different. Text embeddings will say they're the same. Your customers won't agree.
For visual products, you want recommendations based on the image. Same algorithm, different input.
With Vecstore, you upload the product images alongside the text and search becomes multimodal automatically. The recommendation query can be a product image, a product description, or both. The closest results are the ones that match in both visual and textual meaning.
This is the kind of feature that used to require running CLIP yourself, building an image preprocessing pipeline, and managing GPU inference. Now it's an API call.
Adding Behavioral Data Over Time
Once you have traffic, you can layer collaborative signals on top. The simplest version:
- Track which items each user views or buys
- For each user, build a profile by averaging the embeddings of the items they've engaged with
- Recommend items closest to that profile
This gives you "for you" style recommendations without training a model. It's an embedding average and a search. That's the entire system.
You can get fancier (decay weights for older items, separate profiles for different categories, blending content and collaborative scores) but the baseline is something you can ship in an afternoon.
What to Track to Make Recommendations Better
Recommendations get better when you measure and iterate. The metrics that matter:
Click-through rate. What percentage of users click a recommendation when shown one? This is the surface-level signal. Track it per slot, per page, per category.
Conversion rate from recommendations. Of users who clicked a recommendation, how many bought, signed up, or converted? CTR can be misleading. CR is what matters.
Coverage. What percentage of your catalog ever shows up in recommendations? If a few popular items dominate every result, your tail content is invisible. That's a real problem at scale.
Diversity. Are recommendations all from the same category? Same brand? Same color? A 6-product carousel of identical items is a bad recommendation, even if each individual match is technically correct.
You don't need a metrics platform to start. You need event tracking on impressions and clicks, and a weekly review.
Common Mistakes
A few patterns that wreck recommendation quality:
Recommending the same item. Always exclude the current item from results. Sounds obvious, gets missed constantly.
Ignoring stock or availability. Recommending out-of-stock products kills trust fast. Filter at query time.
Using titles only. Product titles are short and noisy. Combine title and description (and category, brand, tags if you have them) when generating the embedding for similarity.
Showing too few results. A "more like this" with two items looks broken. A carousel needs 6+. Plan for it from the start.
Personalizing too aggressively too early. Without enough behavioral data, "personalized" recommendations are just random. Start with content-based and layer personalization in once you have signals worth using.
When to Build Something More Complex
The setup above will carry you a long way. Most products never need more. But there are cases where it isn't enough:
- You have very specific business rules (boost margins, balance inventory, hit promotional goals).
- You have rich user history and the simple "average their embeddings" approach is leaving signal on the table.
- You're optimizing for long-term metrics like retention, not next-click.
These are real problems and they're worth real ML investment when the time comes. But they're problems you have when you have traffic, data, and revenue. Don't solve them on day one.
The Bottom Line
Recommendation engines used to be a project. Now they're a feature. The infrastructure that took ML teams six months to build is a managed API call away.
If you're shipping a product and want "more like this," "you might also like," or "for you" recommendations, you don't need a data scientist. You need embeddings, ANN search, and a way to query both. A search API gives you all three.
Start with content-based, ship it fast, layer on behavioral data as you collect it. The teams that win at recommendations are the ones who ship something simple and iterate, not the ones who spend a year building the perfect system.


