The Agent Usability Report

What it is

A free scanner at /usability-report. Paste any URL you’re authorized to scan; we give you a markdown report telling you whether an AI agent could actually use that business — discover it, call its API, pay for it, and rely on it in a workflow.

It is not an SEO audit. SEO asks “will Google rank you?” This asks “if a model decides to spend money on this site, can it succeed?”

Why it exists

Most websites today were built for humans clicking. The headless economy assumes the user is software. The gap between “looks fine in a browser” and “works for an agent” is wider than most operators realize. The report makes that gap legible.

How it works

STEP 01

Probe

~35 HTTP requests + sitemap-derived multi-page sample

STEP 02

Score

53 fixed checks → number / 100 (75 for info sites)

STEP 03

Write

deterministic verdict + ranked quick wins

STEP 04

Report

markdown + signed link

The model never decides points. It writes the explanation after the score is fixed. Two scans of the same site produce the same number.

The score — 100 points across 5 categories (75 for informational sites)

Weight per category — total 110 pts (85 for informational sites)

Discovery

20 pts

APIs & agent endpoints

30 pts

Content & semantics

20 pts

Commerce & reliability

25 pts

Security & trust

15 pts

Category	Weight	Asks
Discovery	25	Can agents find your site, crawl it, and read it as markdown?
APIs & agent endpoints	18	Are there OpenAPI, MCP, WebMCP, or A2A surfaces an agent can call?
Content & semantics	17	If the agent fetches a page, can it actually parse what’s there?
Commerce & reliability	25	Can the agent buy, get an API key without sales, and rely on the service?
Security & trust	15	Is it safe to embed in an automated workflow?

Each category is a list of small, mechanically-verifiable checks (≥ 1000 chars of body text in raw HTML, Strict-Transport-Security header present, /llms.txt returns 200, etc.). Pass / partial / fail. Points add up.

Emerging-protocol checks are weighted at 1pt apiece by design. WebMCP, MCP-discovery, A2A Agent Cards, agents.json, content-negotiation-for-markdown — these are all live design work, not settled standards. Penalising 3pts per missing manifest would misrepresent the engine as authoritative when much of the surface is in flux. Stable surfaces (HTTPS, sitemap, JSON-LD, security headers, OAuth metadata, public API docs) carry the weight; emerging surfaces are flagged as opportunities without anchoring the score.

The four usability tiers

Below the score, four binary questions tell you which kind of site you’re looking at — independent of the number. Read them in journey order: an agent has to find the site before it can call it, has to be able to call it before it can credential itself, has to credential itself before it can run a workflow.

Tier	Asks	Passes if
T1 — Discoverable	Can an agent discover this site?	At least 2 of: AI crawler directives, llms.txt, sitemap, OpenAPI
T2 — Callable	Anything for an agent to call?	APIs category earned ≥ 6 (e.g. public API docs page + discoverable API base URL)
T3 — Provisionable	Can it credential itself?	Self-serve API key generation found AND auth method documented
T4 — Operable	Can it run a workflow?	Machine-aligned pricing AND no bot challenge on first request

Tiers met	Label
4 of 4	headless-native
3 of 4	meaningfully headless
2 of 4	API-adjacent
0–1 of 4	not meaningfully headless

A site can score well on polish and still be API-adjacent because it has no API. The label catches that.

Hard caps — when the score is capped regardless

A few failures are bad enough that no amount of polish elsewhere should hide them:

If…	Total capped at
Bot challenge or CAPTCHA on first request	39
No machine-callable surface at all	49
`GET /` returns 4xx/5xx to a bot	49
Security & trust < 3	69

What the report looks like

example.com — agent usability report

Score

72 / 100 B

How to read your report

A finished report is dense by design. Read it in this order:

Score and tiers met — the headline number plus the X-of-4 tier count. Together they tell you “how polished” + “how headless”.
What this means — a plain-English paragraph derived directly from which tiers your site passes, plus a per-tier table (T1 Discoverable / T2 Callable / T3 Provisionable / T4 Operable, ✓ or ✗ for your site, with a one-line “what’s missing” for each ✗).
Cap callout — only present when a hard cap is suppressing your score. It names the cap value, the cause (e.g. “bot challenge or CAPTCHA on first request”), and what your additive total would be without the cap. Removing the cap condition is always the highest-leverage fix.
Score breakdown — the five categories with how much each earned out of its weight.
Quick wins — the top five highest-gain fixes ranked by points-recoverable.
Per-category checks — every check, pass or fail, with an actionable card on each ✗ / △ row.

A worked example. Imagine a fictional events site, whats-on-la.org, that scores 18 / 75, 1 of 4 tiers met — Commerce & reliability is marked n/a because no commerce surface was detected, so the denominator is 75 not 100. The “What this means” block tells you, in one paragraph: Your site passes 1 of 4 usability tiers (T1 Discoverable). The table below shows where the gaps are. The per-tier table shows T1 ✓ (“sitemap + AI-crawler directives present”), T2 ✗ (“APIs category earned 0 / 18 (need ≥ 6)”), T3 and T4 read as “doesn’t apply to informational sites”. No cap callout because no cap fired. The category breakdown shows where the 18 points came from. The quick-wins list ranks the cheapest improvements.

The reading order matters. Tiers met is the diagnosis; score is polish over the diagnosis; per-check rows are the punchlist. A 18/75 with 1 of 4 tiers met means the architecture is wrong, polish won’t help — you need a callable API. A 90/100 with all four tiers met means the architecture is right, polish will help — work the quick-wins list.

Informational and non-commercial sites are scored against a smaller denominator. When the scanner detects no pricing, signup, or API-docs surfaces, Commerce & reliability is marked n/a and the maximum drops from 100 to 75 / 75 = best-in-class. A perfect 75/75 sits at 100% / grade A+ / headless-native exactly like a perfect 100/100 — same percentage, same label. T3 (Provisionable) and T4 (Operable) read as “doesn’t apply” rather than ✗ for those sites.

The engine samples up to 2 sitemap-discovered pages in addition to the homepage. Multi-page-aggregated checks (image_alt_coverage, semantic_landmarks, link_descriptive_coverage, aria_hidden_misuse) average across all sampled pages so a noisy marketing homepage doesn’t drag down a site with a clean docs / blog template. Sitemap discovery uses the standard paths (/sitemap.xml, /sitemap_index.xml) AND the Sitemap: directive in robots.txt — Stripe-class sites that host the sitemap at a non-standard path still pass.

How to use it

Open /usability-report.
Paste a URL you own or are authorized to scan (the Terms cover this — only sites you own or have explicit permission for).
Wait ~2 minutes. Read the report inline or download the markdown.

Limits

One scan per domain per year. After a successful scan, the domain is locked. If you re-paste the URL later, the form shows the cached report instead of running a new scan.
10 scans per IP per hour.
60 seconds between scans of distinct domains.
Reports default to private. Visibility can be set to public at scan time, in which case the report is renderable at /scan/<domain>.

What the score does not capture

Actual uptime — we score publicly disclosed operational signals (status page, SLA docs, changelog), not measured uptime.
API quality — having an OpenAPI doc passes the check; whether the API itself is well-designed is editorial, not deterministic.
Brand and trust beyond signals — we read the security headers, not the company.

If you need a full integration audit, this isn’t it. If you need a fast, comparable verdict on whether a site is even trying to serve agents, this is it.

The Headless Economy — what the report is scoring you against
Product Design for Agents — what high-scoring sites tend to share
The Headless Customer Funnel — the consumption stages the report mirrors