How to Build an Image Similarity Feature in Python

This tutorial builds a working image similarity feature in Python. You'll be able to upload a photo and find the most visually similar images in your database, or describe what you're looking for in plain text and get matching results.

We'll use requests for the API calls and asyncio with aiohttp for batch inserting images efficiently. By the end you'll have a reusable Python module and a small Flask API you can plug into any project.

What We're Building

A Python service that can:

Insert images into a searchable database
Find visually similar images given a query image
Find matching images given a text description
Batch insert thousands of images without it taking forever

Prerequisites

Python 3.10+
A Vecstore account (free tier works)
An image database created in the Vecstore dashboard
Your API key and database ID

Step 1: Install Dependencies

pip install requests flask python-dotenv aiohttp

Create a .env file:

VECSTORE_API_KEY=your_api_key_here
VECSTORE_DB_ID=your_database_id_here

Step 2: Build the Core Module

Create vecstore.py. This handles all the API communication.

import os
import base64
import requests
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("VECSTORE_API_KEY")
DB_ID = os.getenv("VECSTORE_DB_ID")
BASE_URL = "https://api.vecstore.app/api"

HEADERS = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json",
}


def insert_image_url(url: str) -> dict:
    """Insert an image by URL."""
    resp = requests.post(
        f"{BASE_URL}/databases/{DB_ID}/documents",
        headers=HEADERS,
        json={"image_url": url},
    )
    return resp.json()


def insert_image_file(path: str) -> dict:
    """Insert an image from a local file."""
    with open(path, "rb") as f:
        b64 = base64.b64encode(f.read()).decode()
    resp = requests.post(
        f"{BASE_URL}/databases/{DB_ID}/documents",
        headers=HEADERS,
        json={"image": b64},
    )
    return resp.json()


def search_by_text(query: str, top_k: int = 10) -> dict:
    """Search images by text description."""
    resp = requests.post(
        f"{BASE_URL}/databases/{DB_ID}/search",
        headers=HEADERS,
        json={"query": query, "top_k": top_k},
    )
    return resp.json()


def search_by_image_url(url: str, top_k: int = 10) -> dict:
    """Find similar images given an image URL."""
    resp = requests.post(
        f"{BASE_URL}/databases/{DB_ID}/search",
        headers=HEADERS,
        json={"image_url": url, "top_k": top_k},
    )
    return resp.json()


def search_by_image_file(path: str, top_k: int = 10) -> dict:
    """Find similar images given a local file."""
    with open(path, "rb") as f:
        b64 = base64.b64encode(f.read()).decode()
    resp = requests.post(
        f"{BASE_URL}/databases/{DB_ID}/search",
        headers=HEADERS,
        json={"image": b64, "top_k": top_k},
    )
    return resp.json()

Six functions. That's the entire integration. You can use this module from any Python project.

Quick test:

from vecstore import insert_image_url, search_by_text

# insert a few images
insert_image_url("https://example.com/products/red-sneakers.jpg")
insert_image_url("https://example.com/products/blue-jacket.jpg")
insert_image_url("https://example.com/products/leather-boots.jpg")

# search by text
results = search_by_text("comfortable running shoes")
for r in results["results"]:
    print(f"{r['score']:.2f} - {r['vector_id']}")

The text query "comfortable running shoes" will match the red sneakers image even though those exact words don't appear anywhere. The search works by meaning, not keywords.

Step 3: Batch Insert Images

If you have hundreds or thousands of images to insert, doing them one at a time is slow. The bottleneck is network round-trips, not the API itself.

asyncio with aiohttp fixes this by firing off many requests concurrently. Create batch_insert.py:

import asyncio
import os
import aiohttp
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("VECSTORE_API_KEY")
DB_ID = os.getenv("VECSTORE_DB_ID")
BASE_URL = "https://api.vecstore.app/api"
URL = f"{BASE_URL}/databases/{DB_ID}/documents"

HEADERS = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json",
}


async def insert_one(session: aiohttp.ClientSession, image_url: str):
    async with session.post(URL, headers=HEADERS,
                              json={"image_url": image_url}) as resp:
        result = await resp.json()
        print(f"Inserted: {image_url}")
        return result


async def batch_insert(image_urls: list[str], concurrency: int = 20):
    sem = asyncio.Semaphore(concurrency)

    async def throttled(session, url):
        async with sem:
            return await insert_one(session, url)

    async with aiohttp.ClientSession() as session:
        tasks = [throttled(session, url) for url in image_urls]
        return await asyncio.gather(*tasks)


# Usage
if __name__ == "__main__":
    urls = [
        "https://example.com/products/001.jpg",
        "https://example.com/products/002.jpg",
        "https://example.com/products/003.jpg",
        # ... add your image URLs
    ]
    asyncio.run(batch_insert(urls))

The concurrency parameter controls how many requests run at the same time. 20 is a safe default. If you're inserting thousands of images, this cuts the total time dramatically compared to sequential requests.

For reference, inserting 1,000 images sequentially takes about 15-20 minutes. With 20 concurrent requests, the same batch finishes in under 2 minutes.

Step 4: Build a Flask API

If you want to expose image similarity as an HTTP endpoint (for a frontend, mobile app, or another service), here's a minimal Flask server. Create app.py:

from flask import Flask, request, jsonify
import vecstore

app = Flask(__name__)


@app.route("/search/text", methods=["POST"])
def text_search():
    data = request.json
    results = vecstore.search_by_text(
        data["query"],
        top_k=data.get("top_k", 10),
    )
    return jsonify(results)


@app.route("/search/image", methods=["POST"])
def image_search():
    file = request.files["image"]
    # save temporarily, search, delete
    path = f"/tmp/{file.filename}"
    file.save(path)
    results = vecstore.search_by_image_file(path)
    os.remove(path)
    return jsonify(results)


@app.route("/insert", methods=["POST"])
def insert():
    data = request.json
    result = vecstore.insert_image_url(data["image_url"])
    return jsonify(result)


if __name__ == "__main__":
    import os
    app.run(port=5000, debug=True)

Run it with python app.py and test:

# text search
curl -X POST http://localhost:5000/search/text \
  -H "Content-Type: application/json" \
  -d '{"query": "red leather handbag"}'

# image search
curl -X POST http://localhost:5000/search/image \
  -F "image=@photo.jpg"

How Similarity Scoring Works

The API returns a score between 0 and 1 for each result.

0.90-1.0 - near-identical images (same photo, different crop or resolution)
0.75-0.90 - very similar (same type of object, similar colors and composition)
0.60-0.75 - related (same category, some visual overlap)
Below 0.60 - loosely related or unrelated

What counts as a "good match" depends on your use case. For duplicate detection, you want 0.90+. For "find similar products," 0.70+ usually works well. You control the threshold in your application logic.

Use Cases

A few practical things you can build with this:

Duplicate detection. Check if a newly uploaded image already exists in your database. Insert all your images, then search each new upload. If the top result scores above 0.90, it's likely a duplicate.

def is_duplicate(image_path: str, threshold: float = 0.90) -> bool:
    results = vecstore.search_by_image_file(image_path, top_k=1)
    if results["results"]:
        return results["results"][0]["score"] >= threshold
    return False

"More like this" recommendations. User clicks on a product image. You search for similar images and show them as recommendations. No recommendation engine needed.

Visual catalog search. A customer takes a photo of something they want to buy. Your app finds the closest matches in your product catalog. Works for furniture, clothing, parts, anything visual.

Content moderation. Check uploaded images against a database of known prohibited content. Flag anything above a similarity threshold.

Things to Keep in Mind

Image size. Sending large images (5MB+) slows things down because of upload time, not processing time. If you're working with high-res photos, consider resizing to 1000px on the longest side before sending. Search quality doesn't improve much beyond that.

Free tier. You get 100 free credits on signup, no credit card required. Each insert or search costs 1 credit, so that's 100 operations to get started. After that it's pay-as-you-go.

Metadata. When you insert images, you can attach metadata (product name, category, price, whatever). This metadata comes back with search results. If you need to display product info alongside similar images, store it at insert time:

requests.post(
    f"{BASE_URL}/databases/{DB_ID}/documents",
    headers=HEADERS,
    json={
        "image_url": "https://example.com/products/shoe.jpg",
        "metadata": {
            "name": "Red Running Shoe",
            "price": 89.99,
            "category": "footwear",
        },
    },
)

What Else You Can Do

Same database, same API key:

Face search - upload a face photo, find every image of that person
OCR search - find images by the text inside them (signs, documents, screenshots)
NSFW detection - check images for content safety before storing them

No extra setup for any of these.

Wrapping Up

The full project is: one Python module (vecstore.py), one batch insert script, and optionally a Flask server. No ML models, no GPU, no vector database to manage.

The code in this tutorial works as-is. For production you'd add error handling, retries, and logging. But the similarity search itself is ready to go.

Get started with Vecstore - free tier includes enough credits to build and test.