The Agent Usability Report
What the scanner does, how the score is built from ~50 deterministic checks across five categories, and how to read the report it gives you back.
What it is
A free scanner at /usability-report. Paste any URL you’re authorized to scan; we give you a markdown report telling you whether an AI agent could actually use that business — discover it, call its API, pay for it, and rely on it in a workflow.
It is not an SEO audit. SEO asks “will Google rank you?” This asks “if a model decides to spend money on this site, can it succeed?”
Why it exists
Most websites today were built for humans clicking. The headless economy assumes the user is software. The gap between “looks fine in a browser” and “works for an agent” is wider than most operators realize. The report makes that gap legible.
How it works
The model never decides points. It writes the explanation after the score is fixed. Two scans of the same site produce the same number.
The score — 100 points across 5 categories (75 for informational sites)
| Category | Weight | Asks |
|---|---|---|
| Discovery | 25 | Can agents find your site, crawl it, and read it as markdown? |
| APIs & agent endpoints | 18 | Are there OpenAPI, MCP, WebMCP, or A2A surfaces an agent can call? |
| Content & semantics | 17 | If the agent fetches a page, can it actually parse what’s there? |
| Commerce & reliability | 25 | Can the agent buy, get an API key without sales, and rely on the service? |
| Security & trust | 15 | Is it safe to embed in an automated workflow? |
Each category is a list of small, mechanically-verifiable checks (≥ 1000 chars of body text in raw HTML, Strict-Transport-Security header present, /llms.txt returns 200, etc.). Pass / partial / fail. Points add up.
Emerging-protocol checks are weighted at 1pt apiece by design. WebMCP, MCP-discovery, A2A Agent Cards, agents.json, content-negotiation-for-markdown — these are all live design work, not settled standards. Penalising 3pts per missing manifest would misrepresent the engine as authoritative when much of the surface is in flux. Stable surfaces (HTTPS, sitemap, JSON-LD, security headers, OAuth metadata, public API docs) carry the weight; emerging surfaces are flagged as opportunities without anchoring the score.
The four usability tiers
Below the score, four binary questions tell you which kind of site you’re looking at — independent of the number. Read them in journey order: an agent has to find the site before it can call it, has to be able to call it before it can credential itself, has to credential itself before it can run a workflow.
| Tier | Asks | Passes if |
|---|---|---|
| T1 — Discoverable | Can an agent discover this site? | At least 2 of: AI crawler directives, llms.txt, sitemap, OpenAPI |
| T2 — Callable | Anything for an agent to call? | APIs category earned ≥ 6 (e.g. public API docs page + discoverable API base URL) |
| T3 — Provisionable | Can it credential itself? | Self-serve API key generation found AND auth method documented |
| T4 — Operable | Can it run a workflow? | Machine-aligned pricing AND no bot challenge on first request |
| Tiers met | Label |
|---|---|
| 4 of 4 | headless-native |
| 3 of 4 | meaningfully headless |
| 2 of 4 | API-adjacent |
| 0–1 of 4 | not meaningfully headless |
A site can score well on polish and still be API-adjacent because it has no API. The label catches that.
Hard caps — when the score is capped regardless
A few failures are bad enough that no amount of polish elsewhere should hide them:
| If… | Total capped at |
|---|---|
| Bot challenge or CAPTCHA on first request | 39 |
| No machine-callable surface at all | 49 |
GET / returns 4xx/5xx to a bot | 49 |
| Security & trust < 3 | 69 |
What the report looks like
- ▸Add HSTS header +3
- ▸Add /llms.txt +3
- ▸Document rate limits +2
- ▸Fix heading hierarchy +2
You also get a markdown download for filing or sharing internally.
How to read your report
A finished report is dense by design. Read it in this order:
- Score and tiers met — the headline number plus the X-of-4 tier count. Together they tell you “how polished” + “how headless”.
- What this means — a plain-English paragraph derived directly from which tiers your site passes, plus a per-tier table (T1 Discoverable / T2 Callable / T3 Provisionable / T4 Operable, ✓ or ✗ for your site, with a one-line “what’s missing” for each ✗).
- Cap callout — only present when a hard cap is suppressing your score. It names the cap value, the cause (e.g. “bot challenge or CAPTCHA on first request”), and what your additive total would be without the cap. Removing the cap condition is always the highest-leverage fix.
- Score breakdown — the five categories with how much each earned out of its weight.
- Quick wins — the top five highest-gain fixes ranked by points-recoverable.
- Per-category checks — every check, pass or fail, with an actionable card on each ✗ / △ row.
A worked example. Imagine a fictional events site, whats-on-la.org, that scores 18 / 75, 1 of 4 tiers met — Commerce & reliability is marked n/a because no commerce surface was detected, so the denominator is 75 not 100. The “What this means” block tells you, in one paragraph: Your site passes 1 of 4 usability tiers (T1 Discoverable). The table below shows where the gaps are. The per-tier table shows T1 ✓ (“sitemap + AI-crawler directives present”), T2 ✗ (“APIs category earned 0 / 18 (need ≥ 6)”), T3 and T4 read as “doesn’t apply to informational sites”. No cap callout because no cap fired. The category breakdown shows where the 18 points came from. The quick-wins list ranks the cheapest improvements.
The reading order matters. Tiers met is the diagnosis; score is polish over the diagnosis; per-check rows are the punchlist. A 18/75 with 1 of 4 tiers met means the architecture is wrong, polish won’t help — you need a callable API. A 90/100 with all four tiers met means the architecture is right, polish will help — work the quick-wins list.
Informational and non-commercial sites are scored against a smaller denominator. When the scanner detects no pricing, signup, or API-docs surfaces, Commerce & reliability is marked n/a and the maximum drops from 100 to 75 / 75 = best-in-class. A perfect 75/75 sits at 100% / grade A+ / headless-native exactly like a perfect 100/100 — same percentage, same label. T3 (Provisionable) and T4 (Operable) read as “doesn’t apply” rather than ✗ for those sites.
The engine samples up to 2 sitemap-discovered pages in addition to the homepage. Multi-page-aggregated checks (image_alt_coverage, semantic_landmarks, link_descriptive_coverage, aria_hidden_misuse) average across all sampled pages so a noisy marketing homepage doesn’t drag down a site with a clean docs / blog template. Sitemap discovery uses the standard paths (/sitemap.xml, /sitemap_index.xml) AND the Sitemap: directive in robots.txt — Stripe-class sites that host the sitemap at a non-standard path still pass.
How to use it
- Open /usability-report.
- Paste a URL you own or are authorized to scan (the Terms cover this — only sites you own or have explicit permission for).
- Wait ~2 minutes. Read the report inline or download the markdown.
Limits
- One scan per domain per year. After a successful scan, the domain is locked. If you re-paste the URL later, the form shows the cached report instead of running a new scan.
- 10 scans per IP per hour.
- 60 seconds between scans of distinct domains.
- Reports default to private. Visibility can be set to public at scan time, in which case the report is renderable at
/scan/<domain>.
What the score does not capture
- Actual uptime — we score publicly disclosed operational signals (status page, SLA docs, changelog), not measured uptime.
- API quality — having an OpenAPI doc passes the check; whether the API itself is well-designed is editorial, not deterministic.
- Brand and trust beyond signals — we read the security headers, not the company.
If you need a full integration audit, this isn’t it. If you need a fast, comparable verdict on whether a site is even trying to serve agents, this is it.
Related
- The Headless Economy — what the report is scoring you against
- Product Design for Agents — what high-scoring sites tend to share
- The Headless Customer Funnel — the consumption stages the report mirrors