Under the Hood

How SolvedSeek actually works.

SolvedSeek is an independent, specialised search engine and directory for Shopify stores. No Google API. No Bing reskin. We discover Shopify stores from public crawl data, build our own index of their homepages, rank results with our own algorithms, and run our own AI models for understanding what pages are actually about. Here's how it all fits together.

The Architecture

Three technologies, one stack. Everything runs on a single server with no cloud functions, no third-party APIs, and no external dependencies at runtime.

PHP 8.2+

Handles every search query, renders pages, and manages the web interface. Custom-built from scratch, no frameworks.

Node.js Workers

Background processes that discover Shopify stores, verify storefronts, render JavaScript pages, generate AI embeddings, and calculate rankings.

MySQL / MariaDB

Stores every page, link, embedding, and trust score. FULLTEXT indexes power keyword matching at speed.

No frameworks. The search engine, router, template engine, and database layer are all custom-written. No Laravel, no Express, no React. Every line of code is purpose-built for search.

The Crawl Pipeline

Every page in our index goes through a five-stage pipeline. Here's what happens from discovery to ranking.

Discover

New URLs are found through link following, XML sitemaps, and manual submissions. Each URL enters a priority queue where homepages and submitted sites get crawled first. Deeper pages wait their turn.

Crawl

Three parallel crawl workers fetch pages over HTTPS, always checking robots.txt first. We pull out the title, description, body text, and all outbound links. Pages behind CAPTCHAs or bot walls get detected and skipped gracefully.

Render

A lot of modern websites load their content with JavaScript. When our crawler spots a JS-heavy page, it gets sent to a render queue where a headless Chromium browser loads it fully, just like a real visitor would, and extracts the final content.

Understand

Each page goes through three layers of understanding. First, language detection uses trigram analysis to figure out what language the page is written in. Second, entity extraction uses NLP to identify the primary topic, whether that's a company, place, or person. Third, a local AI model converts the text into a mathematical "meaning fingerprint" for semantic search.

Rank

At search time, matching stores are scored on a transparent blend of keyword relevance, semantic meaning, domain authority (Ahrefs DR) and listing completeness, then ordered. Nothing is pre-baked or pay-to-rank.

How Ranking Works

Ranking happens in two stages. First a keyword (FULLTEXT) search retrieves the store homepages most relevant to your query. Then those candidates are re-ordered by a single transparent score whose weights add up to 100% — no hidden penalties, and nothing can be bought.

Text relevance · 50%

How well your search matches the store’s title and homepage text. The title counts for more, and scoring is LOG-damped so keyword stuffing has diminishing returns.

Semantic similarity · 30%

How close your query is in meaning to the store, via a local embedding model — so “eco-friendly trainers” can match a sustainable shoe brand without the exact words.

Domain authority · 15%

A light boost from the store’s Domain Rating (0–100). A relevant small store still outranks an irrelevant big one — authority is a tie-breaker, not a gate. Domain Rating by Ahrefs.

Listing completeness · 5%

A small nudge for stores with a full, useful description, so well-presented listings edge ahead.

No penalties, no paid placement. Relevance (text + meaning) is ~80% of the score; authority and completeness are light tie-breakers. We don’t demote “unknown” stores, and ranking can’t be bought.

Browsing with no query. Filter without search terms (e.g. by industry) and results are ordered by Domain Rating.

Domain diversity. No single domain dominates — results are capped at 2 per root domain.

AI & Semantic Search

Every store’s homepage is converted by a local AI model (all-MiniLM-L6-v2) into a 384-dimension “meaning fingerprint”. Your query gets the same treatment, and the similarity between the two feeds 30% of the ranking score — so meaning matches, not just keywords. Search spans all languages by default; we don’t filter out non-English stores. No data leaves our hardware — queries are never sent to OpenAI, Google or any external service.

Authority & Curation

We gauge a store’s standing with an external, independent signal rather than a number we invent ourselves.

Domain Rating (Ahrefs). Each store carries its Ahrefs Domain Rating (0–100), a widely-used measure of backlink authority. It contributes a modest 15% to ranking and is shown on every listing. Domain Rating by Ahrefs.

Editorial controls. Admins can pin, promote, demote or block specific stores (“twiddlers”) for spam control and curation — applied transparently on top of the score, never sold.

Ethical Crawling

Search engines should be good citizens of the web. Our crawler follows the rules.

robots.txt Fully respected. If you say "don't crawl", we don't crawl. Crawl-delay Honoured. We never hit a site faster than it allows. Meta robots noindex, nofollow, and none are all supported. Canonicals We follow canonical tags to avoid duplicate content. Identification Our bot identifies itself as SolvedSeekBot/1.0 in every request. Sitemaps We read and follow XML sitemaps declared in robots.txt.

By The Numbers

Live data from our index. These numbers update every time you load this page.

32,240

Pages Indexed

32,240

Domains Crawled

693

Searches Served

8,479

AI Embeddings

Links in Graph

26.3%

AI Coverage

The Journey

Building a search engine from scratch is one of the most ambitious projects in software engineering. It touches everything: networking, distributed systems, natural language processing, machine learning, information retrieval, web standards, and database engineering.

SolvedSeek started with a question: is it actually possible to build a real, independent search engine without being Google? Not a meta-search engine that queries someone else's API. Not a Bing reskin with a privacy label. A genuine, independent, build-your-own-index search engine — now focused entirely on Shopify stores.

The answer is yes, but it takes a lot of work. Every part of this system was built, tested, broken, rebuilt, and refined. The crawl pipeline alone went through dozens of iterations before it could reliably handle thousands of pages per day across thousands of domains.

This project has been one of the best learning experiences of my career. I've learned more about how the web actually works (from robots.txt edge cases to DNS resolution quirks to the surprising complexity of HTML parsing) than years of building websites ever taught me.

What's Next

SolvedSeek is a living project. Here's what we're working towards.

Larger Index

Continuously expanding our index to cover more Shopify stores, with smarter prioritisation of high-quality storefronts.

Smarter AI

Exploring larger embedding models and deeper semantic understanding to make search results even more relevant.

Webmaster Tools

Building tools for site owners to see how their pages appear in our index and engage with the search engine directly.

Performance

Faster search, faster crawling, and more efficient infrastructure to handle growth as the index scales.

Run a Shopify store?

Submit your Shopify store and we'll add it. Simple as that.

Submit Your Store

Page generated Jun 28, 2026 at 7:47 AM UTC