/knowledge-base · tools

The Agent Usability Report

What the scanner does, how the score is built from ~50 deterministic checks across five categories, and how to read the report it gives you back.

What it is

A free scanner at /usability-report. Paste any URL you’re authorized to scan; we give you a markdown report telling you whether an AI agent could actually use that business — discover it, call its API, pay for it, and rely on it in a workflow.

It is not an SEO audit. SEO asks “will Google rank you?” This asks “if a model decides to spend money on this site, can it succeed?”

Why it exists

Most websites today were built for humans clicking. The headless economy assumes the user is software. The gap between “looks fine in a browser” and “works for an agent” is wider than most operators realize. The report makes that gap legible.

How it works

STEP 01
Probe
~35 HTTP requests + sitemap-derived multi-page sample
STEP 02
Score
53 fixed checks → number / 100 (75 for info sites)
STEP 03
Write
deterministic verdict + ranked quick wins
STEP 04
Report
markdown + signed link

The model never decides points. It writes the explanation after the score is fixed. Two scans of the same site produce the same number.

The score — 100 points across 5 categories (75 for informational sites)

Weight per category — total 110 pts (85 for informational sites)
Discovery
20 pts
APIs & agent endpoints
30 pts
Content & semantics
20 pts
Commerce & reliability
25 pts
Security & trust
15 pts
CategoryWeightAsks
Discovery25Can agents find your site, crawl it, and read it as markdown?
APIs & agent endpoints18Are there OpenAPI, MCP, WebMCP, or A2A surfaces an agent can call?
Content & semantics17If the agent fetches a page, can it actually parse what’s there?
Commerce & reliability25Can the agent buy, get an API key without sales, and rely on the service?
Security & trust15Is it safe to embed in an automated workflow?

Each category is a list of small, mechanically-verifiable checks (≥ 1000 chars of body text in raw HTML, Strict-Transport-Security header present, /llms.txt returns 200, etc.). Pass / partial / fail. Points add up.

Emerging-protocol checks are weighted at 1pt apiece by design. WebMCP, MCP-discovery, A2A Agent Cards, agents.json, content-negotiation-for-markdown — these are all live design work, not settled standards. Penalising 3pts per missing manifest would misrepresent the engine as authoritative when much of the surface is in flux. Stable surfaces (HTTPS, sitemap, JSON-LD, security headers, OAuth metadata, public API docs) carry the weight; emerging surfaces are flagged as opportunities without anchoring the score.

The four usability tiers

Below the score, four binary questions tell you which kind of site you’re looking at — independent of the number. Read them in journey order: an agent has to find the site before it can call it, has to be able to call it before it can credential itself, has to credential itself before it can run a workflow.

TierAsksPasses if
T1 — DiscoverableCan an agent discover this site?At least 2 of: AI crawler directives, llms.txt, sitemap, OpenAPI
T2 — CallableAnything for an agent to call?APIs category earned ≥ 6 (e.g. public API docs page + discoverable API base URL)
T3 — ProvisionableCan it credential itself?Self-serve API key generation found AND auth method documented
T4 — OperableCan it run a workflow?Machine-aligned pricing AND no bot challenge on first request
Tiers metLabel
4 of 4headless-native
3 of 4meaningfully headless
2 of 4API-adjacent
0–1 of 4not meaningfully headless

A site can score well on polish and still be API-adjacent because it has no API. The label catches that.

Hard caps — when the score is capped regardless

A few failures are bad enough that no amount of polish elsewhere should hide them:

If…Total capped at
Bot challenge or CAPTCHA on first request39
No machine-callable surface at all49
GET / returns 4xx/5xx to a bot49
Security & trust < 369

What the report looks like

example.com — agent usability report
Score
72 / 100 B
Category
meaningfully headless
Tiers met
3 of 4
Quick wins · +10 pts available
  • Add HSTS header +3
  • Add /llms.txt +3
  • Document rate limits +2
  • Fix heading hierarchy +2
Score breakdown
Discovery
16 / 20
APIs & agent endpoints
18 / 25
Content & semantics
13 / 15
Commerce & reliability
14 / 25
Security & trust
11 / 15
+ per-check detail · prioritized fixes · navigation log

You also get a markdown download for filing or sharing internally.

How to read your report

A finished report is dense by design. Read it in this order:

  1. Score and tiers met — the headline number plus the X-of-4 tier count. Together they tell you “how polished” + “how headless”.
  2. What this means — a plain-English paragraph derived directly from which tiers your site passes, plus a per-tier table (T1 Discoverable / T2 Callable / T3 Provisionable / T4 Operable, ✓ or ✗ for your site, with a one-line “what’s missing” for each ✗).
  3. Cap callout — only present when a hard cap is suppressing your score. It names the cap value, the cause (e.g. “bot challenge or CAPTCHA on first request”), and what your additive total would be without the cap. Removing the cap condition is always the highest-leverage fix.
  4. Score breakdown — the five categories with how much each earned out of its weight.
  5. Quick wins — the top five highest-gain fixes ranked by points-recoverable.
  6. Per-category checks — every check, pass or fail, with an actionable card on each ✗ / △ row.

A worked example. Imagine a fictional events site, whats-on-la.org, that scores 18 / 75, 1 of 4 tiers met — Commerce & reliability is marked n/a because no commerce surface was detected, so the denominator is 75 not 100. The “What this means” block tells you, in one paragraph: Your site passes 1 of 4 usability tiers (T1 Discoverable). The table below shows where the gaps are. The per-tier table shows T1 ✓ (“sitemap + AI-crawler directives present”), T2 ✗ (“APIs category earned 0 / 18 (need ≥ 6)”), T3 and T4 read as “doesn’t apply to informational sites”. No cap callout because no cap fired. The category breakdown shows where the 18 points came from. The quick-wins list ranks the cheapest improvements.

The reading order matters. Tiers met is the diagnosis; score is polish over the diagnosis; per-check rows are the punchlist. A 18/75 with 1 of 4 tiers met means the architecture is wrong, polish won’t help — you need a callable API. A 90/100 with all four tiers met means the architecture is right, polish will help — work the quick-wins list.

Informational and non-commercial sites are scored against a smaller denominator. When the scanner detects no pricing, signup, or API-docs surfaces, Commerce & reliability is marked n/a and the maximum drops from 100 to 75 / 75 = best-in-class. A perfect 75/75 sits at 100% / grade A+ / headless-native exactly like a perfect 100/100 — same percentage, same label. T3 (Provisionable) and T4 (Operable) read as “doesn’t apply” rather than ✗ for those sites.

The engine samples up to 2 sitemap-discovered pages in addition to the homepage. Multi-page-aggregated checks (image_alt_coverage, semantic_landmarks, link_descriptive_coverage, aria_hidden_misuse) average across all sampled pages so a noisy marketing homepage doesn’t drag down a site with a clean docs / blog template. Sitemap discovery uses the standard paths (/sitemap.xml, /sitemap_index.xml) AND the Sitemap: directive in robots.txt — Stripe-class sites that host the sitemap at a non-standard path still pass.

How to use it

  1. Open /usability-report.
  2. Paste a URL you own or are authorized to scan (the Terms cover this — only sites you own or have explicit permission for).
  3. Wait ~2 minutes. Read the report inline or download the markdown.

Limits

  • One scan per domain per year. After a successful scan, the domain is locked. If you re-paste the URL later, the form shows the cached report instead of running a new scan.
  • 10 scans per IP per hour.
  • 60 seconds between scans of distinct domains.
  • Reports default to private. Visibility can be set to public at scan time, in which case the report is renderable at /scan/<domain>.

What the score does not capture

  • Actual uptime — we score publicly disclosed operational signals (status page, SLA docs, changelog), not measured uptime.
  • API quality — having an OpenAPI doc passes the check; whether the API itself is well-designed is editorial, not deterministic.
  • Brand and trust beyond signals — we read the security headers, not the company.

If you need a full integration audit, this isn’t it. If you need a fast, comparable verdict on whether a site is even trying to serve agents, this is it.