Under the Hood
How SolvedSeek actually works.
SolvedSeek is an independent, specialised search engine and directory for Shopify stores. No Google API. No Bing reskin. We discover Shopify stores from public crawl data, build our own index of their homepages, rank results with our own algorithms, and run our own AI models for understanding what pages are actually about. Here's how it all fits together.
On this page
The Architecture
Three technologies, one stack. Everything runs on a single server with no cloud functions, no third-party APIs, and no external dependencies at runtime.
PHP 8.2+
Handles every search query, renders pages, and manages the web interface. Custom-built from scratch, no frameworks.
Node.js Workers
Background processes that discover Shopify stores, verify storefronts, render JavaScript pages, generate AI embeddings, and calculate rankings.
MySQL / MariaDB
Stores every page, link, embedding, and trust score. FULLTEXT indexes power keyword matching at speed.
No frameworks. The search engine, router, template engine, and database layer are all custom-written. No Laravel, no Express, no React. Every line of code is purpose-built for search.
The Crawl Pipeline
Every page in our index goes through a five-stage pipeline. Here's what happens from discovery to ranking.
Discover
New URLs are found through link following, XML sitemaps, and manual submissions. Each URL enters a priority queue where homepages and submitted sites get crawled first. Deeper pages wait their turn.
Crawl
Three parallel crawl workers fetch pages over HTTPS, always checking robots.txt first. We pull out the title, description, body text, and all outbound links. Pages behind CAPTCHAs or bot walls get detected and skipped gracefully.
Render
A lot of modern websites load their content with JavaScript. When our crawler spots a JS-heavy page, it gets sent to a render queue where a headless Chromium browser loads it fully, just like a real visitor would, and extracts the final content.
Understand
Each page goes through three layers of understanding. First, language detection uses trigram analysis to figure out what language the page is written in. Second, entity extraction uses NLP to identify the primary topic, whether that's a company, place, or person. Third, a local AI model converts the text into a mathematical "meaning fingerprint" for semantic search.
Rank
At search time, matching stores are scored on a transparent blend of keyword relevance, semantic meaning, domain authority (Ahrefs DR) and listing completeness, then ordered. Nothing is pre-baked or pay-to-rank.
How Ranking Works
Ranking happens in two stages. First a keyword (FULLTEXT) search retrieves the store homepages most relevant to your query. Then those candidates are re-ordered by a single transparent score whose weights add up to 100% — no hidden penalties, and nothing can be bought.
How well your search matches the store’s title and homepage text. The title counts for more, and scoring is LOG-damped so keyword stuffing has diminishing returns.
How close your query is in meaning to the store, via a local embedding model — so “eco-friendly trainers” can match a sustainable shoe brand without the exact words.
A light boost from the store’s Domain Rating (0–100). A relevant small store still outranks an irrelevant big one — authority is a tie-breaker, not a gate. Domain Rating by Ahrefs.
A small nudge for stores with a full, useful description, so well-presented listings edge ahead.
No penalties, no paid placement. Relevance (text + meaning) is ~80% of the score; authority and completeness are light tie-breakers. We don’t demote “unknown” stores, and ranking can’t be bought.
Browsing with no query. Filter without search terms (e.g. by industry) and results are ordered by Domain Rating.
Domain diversity. No single domain dominates — results are capped at 2 per root domain.
AI & Semantic Search
Every store’s homepage is converted by a local AI model (all-MiniLM-L6-v2) into a 384-dimension “meaning fingerprint”. Your query gets the same treatment, and the similarity between the two feeds 30% of the ranking score — so meaning matches, not just keywords. Search spans all languages by default; we don’t filter out non-English stores. No data leaves our hardware — queries are never sent to OpenAI, Google or any external service.
Authority & Curation
We gauge a store’s standing with an external, independent signal rather than a number we invent ourselves.
Domain Rating (Ahrefs). Each store carries its Ahrefs Domain Rating (0–100), a widely-used measure of backlink authority. It contributes a modest 15% to ranking and is shown on every listing. Domain Rating by Ahrefs.
Editorial controls. Admins can pin, promote, demote or block specific stores (“twiddlers”) for spam control and curation — applied transparently on top of the score, never sold.
Ethical Crawling
Search engines should be good citizens of the web. Our crawler follows the rules.
SolvedSeekBot/1.0 in every request.
Sitemaps
We read and follow XML sitemaps declared in robots.txt.
By The Numbers
Live data from our index. These numbers update every time you load this page.
32,240
Pages Indexed
32,240
Domains Crawled
693
Searches Served
8,479
AI Embeddings
0
Links in Graph
26.3%
AI Coverage
The Journey
Building a search engine from scratch is one of the most ambitious projects in software engineering. It touches everything: networking, distributed systems, natural language processing, machine learning, information retrieval, web standards, and database engineering.
SolvedSeek started with a question: is it actually possible to build a real, independent search engine without being Google? Not a meta-search engine that queries someone else's API. Not a Bing reskin with a privacy label. A genuine, independent, build-your-own-index search engine — now focused entirely on Shopify stores.
The answer is yes, but it takes a lot of work. Every part of this system was built, tested, broken, rebuilt, and refined. The crawl pipeline alone went through dozens of iterations before it could reliably handle thousands of pages per day across thousands of domains.
This project has been one of the best learning experiences of my career. I've learned more about how the web actually works (from robots.txt edge cases to DNS resolution quirks to the surprising complexity of HTML parsing) than years of building websites ever taught me.
What's Next
SolvedSeek is a living project. Here's what we're working towards.
Continuously expanding our index to cover more Shopify stores, with smarter prioritisation of high-quality storefronts.
Exploring larger embedding models and deeper semantic understanding to make search results even more relevant.
Building tools for site owners to see how their pages appear in our index and engage with the search engine directly.
Faster search, faster crawling, and more efficient infrastructure to handle growth as the index scales.
Page generated Jun 28, 2026 at 7:47 AM UTC