Self-hosting an AI visibility tracker

Your AI visibility data — every probe, every answer, every citation an engine returned about your brand — is worth owning outright, because it's the raw record of how the models talk about you over time. This is what runs under the hood: the data model that stores it, the pipeline that produces it, and why a tracker like this should be open-core so you can run it next to your own stack.

Key takeaways

The heart of an AI visibility tracker is a raw, append-only probe store — keep every answer, aggregate at query time.
The pipeline is simple: a scheduler enqueues one job per prompt × engine, each engine adapter calls an official API, a shared parser scores the result.
One store feeds three doors — a query API, a CLI, and an MCP server — so the same data serves dashboards, CI, and agents.
Open-core means you can self-host for free and keep your data portable, or pay for managed cloud purely for convenience.

Why self-host

The first reason to self-host an AI visibility tracker is plain ownership. Every probe you run is a small, dated fact about how an engine described your category — and over months those facts become a history you can't reconstruct later. If that history lives only inside someone else's SaaS, you're renting your own measurements. Running the tracker yourself means the raw answers land in your database, on your infrastructure, under your retention rules.

That ownership buys a few concrete things. There's no lock-in: the data is in a schema you control, so you can query it directly, join it against your CRM, or walk away without an export ritual. There's retention on your terms — keep ten years of probes or prune to ninety days, your call, not a plan tier's. There's privacy: the prompts you probe with often reveal your positioning and competitive set, and self-hosting keeps that intent on hardware you trust. And there's proximity — running it next to your existing stack means the same observability, secrets management, and access controls you already use, with no new vendor to onboard.

The data model

The design decision that matters most is to store raw results, not summaries. The core of the system is a single append-only probe-results store: every time the tracker probes one prompt against one engine, it writes one immutable row capturing exactly what came back. You never overwrite or recompute these rows — metrics like visibility rate and share-of-voice are derived from them at query time, so you can change how you measure without losing the underlying evidence.

A single probe result holds roughly these fields:

probe_result
  project_id        # which tracked brand / workspace
  prompt_id         # which prompt in the project's set
  run_id            # the scheduled batch this probe belonged to
  engine            # chatgpt | claude | perplexity
  model             # the specific model used for this probe
  probed_at         # timestamp of the API call
  answer_text       # the full raw answer returned
  brand_mentioned   # bool: was the brand named in the prose?
  brand_cited       # bool: was the brand's domain in the citations?
  citation_rank     # position of the brand's citation, if any
  competitors[]     # competitor brands detected in the answer
  sentiment         # tone toward the brand: pos | neutral | neg
  input_tokens      # tokens sent to the engine
  output_tokens     # tokens returned
  cost              # computed cost of this single probe

Because each row is self-contained and immutable, aggregation is purely a read concern. "Visibility rate this month" is a count of rows where brand_mentioned is true over total rows in the window; share-of-voice is your mention count against the competitors[] tallies on the same prompts. Keeping the raw answer_text alongside the parsed booleans also means you can re-parse history later — if your detection logic improves, you replay it over old answers instead of re-probing.

The probing pipeline

The pipeline that fills that store is deliberately small. A scheduler wakes on each project's cadence and fans the work out: it enqueues one job per (prompt × engine) pair, so a project with twenty prompts across three engines produces sixty independent jobs per run. Each job is isolated, retryable, and cheap to reason about.

Each job is picked up by an engine adapter — one per engine — whose only responsibility is to call that engine's official API with the prompt and return a normalized response. Adapters hide the differences between providers (how Claude, ChatGPT, and Perplexity each expose answers and citations) behind one internal shape. There's no scraping; everything goes through documented APIs, which is what keeps the data stable and the approach defensible.

The normalized response then flows through a shared parser. This is the one place that decides whether the brand was mentioned, whether its domain was cited and at what rank, which competitors appeared, and the answer's sentiment. Centralizing this logic means every engine is scored the same way, and improvements apply everywhere at once. The parser's output, plus the token and cost figures from the adapter, becomes one probe_result row written to the store.

Cost is the thing to design around, since every probe is a paid API call. The pipeline runs with bounded concurrency and per-project limits — a cap on how many probes a project may spend per run and per period — so a misconfigured prompt-set can't quietly burn through your API budget. Limits are policy on top of the same simple job queue.

API, CLI and MCP on top

Once the data lands in one place, the interfaces are thin. The tracker exposes a single query-style API over the probe store: ask for visibility rate, citation rate, share-of-voice, or the raw answers behind any metric, scoped to a project and a time window. Everything else is built on this one surface.

On top of it sit two more doors. A CLI wraps the same API for scripting — drop it into CI to fail a build when visibility drops, or into a cron job to pull a weekly snapshot into your own warehouse. And an MCP server exposes the same queries to AI agents, so an assistant can read your live visibility ("am I being cited for the comparison prompts this week?") as a tool call rather than a dashboard a human has to open. Same data, three doors: a dashboard, a terminal, and an agent all read the identical store.

Why open-core

All of this argues for an open-core model, and that's the honest position. The tracker itself — the store, the pipeline, the adapters, the API, the CLI, the MCP server — is open and free to self-host. If you're happy running a database, a job queue, and your own engine keys, you owe nothing and keep everything.

The managed cloud exists for convenience, not for leverage. It runs the same engine for people who'd rather not operate the queue, rotate keys, or watch the cost limits themselves — you pay for the hosting and the babysitting, not for access to your own numbers. Crucially, the data is portable either way: the schema is the same whether it's your Postgres or ours, so moving from cloud to self-hosted (or back) is an export, not a migration project. Open-core only works if leaving is easy — that's the point, and it's why there's no lock-in on either side.

Frequently asked questions

What do I need to run it?

The moving parts are modest: a relational database for the append-only probe store, a background job queue for the scheduler and workers, and the application itself. It's designed to sit next to a normal web stack — if you can run a Postgres database and a worker process, you can run the tracker. No specialized infrastructure is required.

Do I need my own engine API keys?

Yes, when self-hosting you bring your own keys. Each engine adapter calls the provider's official API using credentials you supply, so you'll need accounts and API keys for the engines you want to track. Those keys stay on your infrastructure, and the usage is billed directly to you by each provider.

What does it cost to self-host?

The software is free to self-host; the real cost is dominated by the LLM and search API usage of the probes themselves. Every probe is a paid API call to an engine, so your bill scales with the number of prompts, engines, and the cadence you run. Server costs for the database and workers are small by comparison. Per-project limits exist specifically to keep that probe spend predictable.

Can I export or migrate later?

Yes. The probe store is a schema you control, so your full history is available as plain rows you can dump, query, or move at any time. Because the cloud and self-hosted versions share the same data model, moving between them is an export and import, not a rebuild. Easy exit is a design goal, not an afterthought.

How is the cloud version different?

The cloud version runs the exact same engine — it's a convenience layer, not a different product. It handles hosting, the job queue, key rotation, and cost guardrails so you don't have to operate them. Your data follows the same portable schema, so you can start on cloud and move to self-hosted later, or vice versa, without lock-in.

Own your AI visibility data.

Run the tracker next to your own stack, keep every probe in a schema you control, and query it from a dashboard, your terminal, or an agent. Self-host it free, or let the cloud handle the plumbing — the data stays yours either way.

Start tracking