AI does not reward persuasion. It rewards a claim it can extract and a source it can trust. Building the provable, agent-readable version of a brand is a discipline, not a campaign, and most of what the market sells as the answer is not it. This essay separates the work that moves the answer from the work that only looks like it does.
By Dan MuirheadResearch by AIVO Agent TeamJune 2026~17 min read
38–65%
Citation rate for original-research
pages (vs 3–8% for product pages)
2.8x
Citation lift from presence on
four or more independent surfaces
~15%
Share of retrieved pages
AI actually cites
34.7
Average structured-data readiness
across AIVO technical audits (/100)
Source: Princeton/Georgia Tech GEO and AuthorityTech (2026); Evertune via Clearscope (2026); Profound (2026); AIVO technical audits, n=106 (June 2026)
From the Founder
Proving it is not a markup problem
Dan Muirhead · Co-Founder, Head of Strategy
The first essay in this volume argued that AI builds its read of a brand at the moment of the question, mostly from sources the brand does not own. The natural next question, the one every operator asks us once they accept the frame, is what to actually do about it. And the most common answer in the market is a technical shopping list: add schema, publish an llms.txt file, feed the model your data. We watch companies spend real budget on that list and get very little back.
There is something real in the instinct. Structure does help a machine read a page, tooling does surface gaps, and a brand’s own content genuinely matters. Nate B Jones gave the useful version of the instinct a name. He argued that a company needs a truth layer, the agent-readable, provable version of itself distributed across the internet, because agents do not respond to emotional claims the way humans do. They need to verify. His phrase for it is exact: agents need you to prove it.
What the technical shopping list misses is that proving it is not a markup problem. It is an evidence problem and a consistency problem. The work that moves an AI answer is original research the model has to cite to use, claims paired with verifiable proof, an entity that resolves cleanly across every surface it appears on, and validation earned from sources the model already trusts. None of that comes from a plugin. And underneath all of it sits a question most brands never check: whether the machine is allowed to read them at all, a decision that now lives at the network edge, not in a text file.
What follows is the operational layer. It separates the moves that move the answer from the moves that only look like they do, it shows why the right tactics differ by engine and by intent, and it ends with a build order sequenced by leverage. The brands that treat this as a discipline will own their category’s answer. The brands that treat it as a checklist will keep paying for markup the machine never reads.
01 — The Provability Standard
AI rewards a claim it can extract and a source it can trust
Start with what actually earns a citation, because the data here is unambiguous and most content strategies ignore it. The single strongest cross-platform predictor of whether AI cites a page is whether the page carries original research or proprietary data. Pages built on original data are cited at 38 to 65 percent rates in external testing. Blog content sits at 6 to 15 percent. Product and marketing pages, the pages most brands pour the most effort into, sit at 3 to 8 percent. The gap is not a rounding difference. It is the difference between being the source and being ignored.
The mechanism is structural. AI composes an answer by extracting claims it can attribute and verify. A proprietary number, a dated benchmark, a named study, these are inherently attributable, because the model cannot synthesize them from anywhere else. A persuasive adjective is not. “Industry-leading performance” gives the machine nothing to extract and no reason to trust it over the identical phrase on a competitor’s page. The provability standard is a short list: pair every claim with evidence, use dated specifics, lead with the answer, and keep one topic per page.
“You are going to be flattened into the internet average for your category.”
The format of the proof matters as much as its existence. Answer-first writing, where the direct answer lands in the first 50 to 80 words, is cited far more often, because AI pulls disproportionately from the top of a page: roughly 44 percent of ChatGPT citations come from the first third of a page, and answer-first paragraphs are cited about 67 percent more often than buried ones. Comparison tables with consistent columns are pulled around 3.7 times more than the same information written as prose. Transparent pricing pages, with real numbers, are cited about 3.1 times more than vague ones. These are not stylistic preferences. They are the extraction surface the machine is reaching for.
What AI Cites, by Asset Type
Asset type
AI citation rate
Original research / proprietary data
38–65%
Structured comparison tables
~67% (vs 32.5% as prose)
Answer-first pages
~67% more often than buried answers
Blog / opinion content
6–15%
Product / marketing pages
3–8%
Source: Princeton/Georgia Tech GEO; AuthorityTech; Digital Applied; Kevin Indig (1.2M results); CXL (2026)
This is the operational meaning of the truth layer. Not more pages, and not better adjectives. Claims a machine can lift, attached to evidence a machine can trust.
02 — Entity Consistency Is Infrastructure
Multi-surface agreement is the prerequisite for being trusted at all
Provable content only works if the machine can tell that all of it describes the same company. This is the part of the job that looks like plumbing and behaves like strategy. AI systems do not accumulate knowledge as a pile of documents. They accumulate it as entities, and they resolve a brand by tracing references to it across many independent sources and checking whether those references agree. If they agree, the model gains confidence and cites. If they conflict, the model does something worse than rank you low. It drops you.
The threshold is measurable, and it is the same one that surfaced in the first essay from a different angle. Brands present across four or more independent authoritative platforms are roughly 2.8 times more likely to be cited than single-platform brands. Branded web mentions matter more than backlinks: brands in the top quartile for mentions earn around ten times more AI Overview citations than the next quartile. Authority, in the AI sense, is not a score on your own domain. It is the weight of consistent, corroborated references to you across domains you do not control.
“Models choose silence over citing you.”
Inconsistency is the quiet killer. If your site says “Acme Software,” your LinkedIn says “Acme,” your directory listing says “Acme Software Inc.,” and a trade outlet calls you “Acme Systems,” the model may treat those as separate entities and resolve none of them with confidence. Unverifiable or conflicting data does not produce a guess. It produces omission, because the model would rather stay silent than attribute a claim to the wrong company. The fix is unglamorous and high-leverage: one canonical name, one category statement, one short boilerplate posted verbatim everywhere, consistent executive names, and explicit links from your structured data to the authoritative entity records the engines already trust, Wikidata and Wikipedia among them.
Our own audit data shows how far most sites are from this. Across hundreds of technical audits, crawlability is largely solved, with an average crawler-access score of 85.9, but the signals that build entity trust are thin. Average structured-data readiness is 34.7 out of 100. Publish dates appear on 43 percent of audited sites and author attribution on 40 percent. The schema that does exist is mostly identity markup, declaring that a company is an organization, with very little content-level markup that states a provable claim. Most brands have told the machine who they are. Very few have told it, consistently and verifiably, what they do.
Crawlability Is Solved. Entity Trust Is Not.
Signal
Score / Presence
Status
Crawler access
85.9 / 100
Largely solved
Structured data readiness
34.7 / 100
Gap
Pages with publish dates
43%
Gap
Pages with author attribution
40%
Gap
Source: AIVO technical audits (October 2025 - June 2026)
Entity consistency is not a copywriting task or a one-time schema project. It is infrastructure that has to be maintained across every surface the brand touches, because the moment the surfaces disagree, the trust the machine was building collapses.
03 — Original Research Is the Highest-Leverage Asset
A proprietary number is the one thing the model cannot get anywhere else
If provable content earns citations and original data earns the most, then original research is the highest-leverage asset a brand can build, and it deserves to be treated as such rather than as an occasional content-marketing flourish. The reason is simple and worth stating plainly: AI must cite unique data to use it. A proprietary benchmark, a first-party survey, a number that exists nowhere else, forces attribution, because the model cannot reproduce the claim without pointing at its source. Proof assets do not just rank better. They make the brand the origin of the answer.
The durability compounds the advantage. Original-data pages stay cited for six to twelve months, far longer than the news cycle of a blog post, which means a single rigorous study earns citations long after it ships. Broad trend research over-indexes at the top of the funnel, where the model is defining a category, and proof and benchmark data over-indexes at the bottom, where it is justifying a recommendation. The implication for operators is to publish original data on a cadence rather than once, because citation velocity follows freshness as well as substance.
This also clarifies why earned validation and original research reinforce each other rather than competing for budget. Earned media makes up the large majority of the citation stream, on the order of 82 to 89 percent across recent multi-engine studies, and articles that cite a named study with a date and a sample size are pulled about 3.8 times more often than ones that wave at “research shows.” Original research is the thing that gets earned. A proprietary number is what a journalist quotes, what a Reddit thread references, what a competitor’s comparison page has to acknowledge. The asset you own becomes the validation you do not, which is the most efficient path into the four-or-more-surface threshold that decides citation eligibility.
There is a recursive honesty worth admitting here. This essay rests on roughly a million citations and 122,000 audit records, a proprietary data foundation, because that is exactly the kind of asset that earns the citation. We are making the argument with the asset the argument recommends.
04 — Engine, Intent, and the Access Layer
The tactics vary by engine, and none of them matter if the machine cannot read you
The truth layer is not built once for a generic “AI.” The engines behave differently enough that a single tactic cannot serve all of them. They differ first in how much they retrieve. In AIVO data, Google’s AI surfaces cite roughly 14 sources per answer and ChatGPT cites about 3, with Perplexity, Gemini, and Claude in between. More sources cited means more room to be one of them, so source and citation work has the most surface area on Google AI and Perplexity, while ChatGPT and Claude reward being the canonical, well-resolved entity over sheer citation volume. They differ second by intent. ChatGPT triggers a live web search on about 42 percent of brand and product queries overall, but on 73 percent of discovery queries and only 10 percent of informational ones. The retrieval-first engines, Perplexity and Google’s AI surfaces, read the live web on nearly every query.
The engines even keep different company. YouTube citations concentrate in Google AI and Perplexity. LinkedIn is largely a Google AI source and is essentially ignored by Perplexity. Reddit is led by ChatGPT at the decision stage and by Google AI, while Perplexity and Claude barely touch it. Medium is a Claude and Gemini habit. A brand chasing citations on the wrong platform for its priority engine is spending against the grain. One caution worth keeping honest: the owned-not-operated surfaces, a brand’s YouTube and LinkedIn presence, mostly show up today as third-party creator content rather than the brand’s own channel, so this lever is real but is realized through earned creators more than corporate accounts.
Underneath all of that sits the precondition everyone assumes and few verify: whether the machine is permitted to read the brand at all. This is where the most common technical advice is simply wrong. robots.txt is not enforcement. In Cloudflare’s own words, compliance with it is voluntary, a statement of preference that does not stop a crawler at a technical level. Real access control now lives at the CDN. AIVO’s live crawler probes show that where access is blocked, it is blocked at the network edge and applied identically across every AI user agent, not decided crawler by crawler in a text file.
Source: AIVO robots analyses and live crawler probes, n=65 sites (June 2026); blocking is enforced at the CDN/WAF and is crawler-agnostic
This matters more every quarter, because the default posture of the web is shifting toward gated access. Cloudflare now offers a one-click block on all AI bots, an option to block only on ad-monetized pages, a managed robots.txt with content-signal directives, and a Pay Per Crawl system that charges crawlers for access through an HTTP payment-required response and authenticated bot identity. Training opt-in and retrieval access are becoming separate, deliberate decisions: a brand can be readable in answers while opted out of training, which matters most for the parametric engines. The practical move is to check the CDN first, confirm retrieval bots are allowed, and decide training access on purpose rather than by accident.
The steelman of the opposing view is that this is too technical for marketing, and that a brand should simply buy a generative-optimization tool that adds schema and an llms.txt file and be done. The instinct to use tooling is fine. The conclusion is not. The levers that move AI, entity resolution, cross-domain corroboration, original research, earned validation, and CDN-level access, are not things a markup plugin produces. Schema added on its own produced no citation lift in controlled testing, as the first essay detailed. llms.txt sits at about 44 percent adoption among the sites we audit with no proven citation effect, a watch item rather than a lever. The work is editorial, research, public relations, and entity discipline. It is exactly the work Nate describes when he says marketing has to touch the surfaces that matter, the website, the pricing, the claims, the documentation, rather than decorate decisions made elsewhere.
05 — So What: Build the Truth Layer in Leverage Order
Five moves, sequenced by what actually moves the answer
The truth layer is buildable, and the sequence matters because the steps depend on each other. Access has to come before content, identity before citations, proof before promotion.
First, verify crawlability at the CDN. Confirm at the network edge, not just in robots.txt, that retrieval and search bots are allowed. Decide training access as a separate, deliberate choice. If the machine cannot read you, nothing downstream matters.
Second, fix entity consistency. Standardize one name, one category statement, one boilerplate, and one set of executive names across your website, your off-premise surfaces, and your earned coverage, and link your structured data to the authoritative entity records the engines trust. Resolve the entity before trying to win citations for it.
Third, produce proof assets. Publish original research and dated, specific, answer-first, claim-to-evidence content on a cadence. This is the highest-leverage owned work, and the asset that earns everything downstream.
Fourth, earn citations across four or more independent surfaces. Direct editorial, review, community, and reference effort at the surfaces the engines that matter for your category actually read. Use your proof assets as the thing those surfaces cite.
Fifth, de-prioritize the theater. Stop treating schema as a silver bullet, llms.txt as a growth hack, and “feed the model” JSON as a strategy. Do the structural work instead.
The Truth Layer Build Order
Step
Move
Why here
01
Verify crawlability at the CDN
Access precedes everything; the lever is the WAF, not robots.txt
02
Fix entity consistency
The machine must resolve one brand before it can trust it
03
Produce proof assets
Original, dated, answer-first content is what gets cited
04
Earn citations across 4+ surfaces
Corroboration across independent sources is the eligibility threshold
05
De-prioritize the theater
Schema-alone, llms.txt, and "feed the model" do not move the answer
Source: AIVO
Moves That Work vs Moves That Only Look Like They Work
Moves that move the answer
Moves that only look like they do
Original research and proprietary data
Publishing more blog volume
Claim-to-evidence pairing, dated specifics
Persuasive adjectives and brand voice alone
Entity consistency across every surface
Identity schema with no content-level claims
Earned validation across 4+ sources
Schema markup added on its own
CDN-level access decisions
An llms.txt file as a growth hack
Source: AIVO; Ahrefs schema study (2026); AIVO technical audits (2026)
This is where AIVO does the work: verifying access, resolving the entity, building the proof assets, and earning the citations, then measuring the result across every engine. The framework is not the point. The reorientation is. The job is to become the most provable brand in the category, not the loudest.
Conclusion
The provable brand wins. The persuasive one gets averaged.
The first essay in this volume established that AI assembles its read of a brand from sources the brand mostly does not control. This essay is the answer to the question that follows. You do not control the interpretation, but you can build the thing the interpretation is made of: a provable, consistent, well-sourced version of the brand that a machine can read, resolve, and trust. That is the truth layer, and building it is a discipline of evidence and consistency, not a campaign and not a markup checklist.
The work has an order. Confirm the machine can read you, resolve your entity so it knows who you are, give it proof it can extract, and earn the corroboration that makes the proof credible. Everything else, the schema sold as a silver bullet, the llms.txt file sold as a shortcut, the promise that you can feed your way into the model, is theater that the data does not support. Agents need you to prove it, and proving it is real, specific, sequenced work.
A brand that cannot be verified will be flattened into the average of its category, described in the same interchangeable language as the competitors it spent years trying to beat. A brand that builds the truth layer becomes the source the machine reaches for. In the interpretation economy, the provable brand is the one that gets named.
Sources and Further Reading
AIVO, technical audits (n=106) and AI Visibility Funnel Matrix research (US-English, June 2026)
AIVO, audit data: ~1,008,850 citations and 123,186 records across 6 engines (June 2026)
Nate B Jones, The Prove-It Economy (YouTube, May 2026)
aiplusautomation (Lee), ChatGPT web-search trigger study (2026)
Ahrefs, We Tracked 1,885 Pages Adding Schema. AI Citations Barely Moved. (May 2026)
Cloudflare, managed robots.txt, Block AI Bots, AI Crawl Control, and Pay Per Crawl documentation and blog (2024–2026)
About the Author
Dan Muirhead is the Founder of AIVO, a strategic AI visibility consultancy powered by a proprietary intelligence platform. He helps brands in high consideration industries like hotels, cruise, e-commerce, and SaaS get found, recommended, and chosen when customers ask AI for answers, and writes about how AI search actually decides what to recommend.