Skip to content
Analysis

AI Search Optimization Has a Publisher Trust Problem

Ali Khallad9 min readUpdated
June 1, 2026 , 9 min read
Share

AI search optimization has a source problem before it has a ranking problem.

Most advice still starts at the visible answer: did ChatGPT mention the brand, did Perplexity cite the page, did Google AI Overviews show a competitor, did the user click? Those questions matter, but they skip the layer underneath. AI systems can only answer from material they can access, retrieve, trust, summarize, or license.

That source layer is getting more complicated. Publishers are suing AI companies. AI companies are signing licensing deals. Crawler rules are becoming a public negotiation. Some sites want visibility from AI answers; others want payment, control, or a stronger reason to keep letting their work feed answer engines.

That does not mean every lawsuit will succeed, or that every AI platform is doing the same thing. It means the open web bargain behind search is being renegotiated in real time. If your brand depends on AI systems to recommend, cite, or compare you, that bargain is now part of your visibility problem.

The source layer is no longer invisible

For years, the basic search exchange was easy to understand. Publishers let search engines crawl their pages. Search engines sent discovery, traffic, subscribers, customers, links, or reputation back to the source. The exchange was imperfect, but the shape was familiar.

AI answers change the shape of that exchange. A user can ask a question, get a summary, see a few cited links, and leave without visiting the publisher that helped produce the answer. Sometimes that is useful for the user. Sometimes it is useful for the source. Sometimes the source does the work and gets little visible return.

That tension explains why source access is moving from background infrastructure to front-page business issue. In May 2026, CNN sued Perplexity, alleging unlawful content distribution, according to Reuters. That is an allegation in active litigation. It is not a court finding. The useful visibility lesson is narrower: major publishers are now treating AI answer systems as a direct economic and control issue.

Licensing points in the same direction. OpenAI has announced content or publisher partnerships with organizations including News Corp, Reddit, the Associated Press, Axel Springer, and the Financial Times. Those deals do not answer every copyright, traffic, or compensation question. They do show that reliable source access has become something worth negotiating.

AI answers still need someone else’s work

The awkward part of AI search is that the answer surface looks self-contained, while the answer itself depends on outside work. Product documentation, publisher reporting, forum threads, reviews, comparison pages, databases, manuals, and brand websites all become raw material.

That is easy to forget when the output is a clean paragraph. It is harder to forget when the sources start asking for a different deal.

Perplexity has an official Publishers Program. OpenAI documents multiple bots, including GPTBot, ChatGPT-User, and OAI-SearchBot. Google tells site owners that its AI experiences are covered by existing Search guidance and points them toward its AI optimization guide, which emphasizes helpful, crawlable content rather than a special AI-only markup layer.

Those public pages matter because they make the negotiation visible. AI companies are explaining how content can be accessed, controlled, or included. Publishers and site owners are deciding whether those terms work for them.

I would be careful about turning this into a morality play. Users want faster answers. Publishers need reasons to keep publishing. AI platforms need fresh and trustworthy sources. Brands need the source layer to be healthy enough that recommendations are not built from stale, thin, or inaccessible material. All of those can be true at once.

Crawler trust is now part of source trust

Robots rules used to feel like plumbing. In AI search, they are becoming a trust signal between site owners and answer engines.

A crawler is not just a crawler anymore. Site owners now have to think about classic search crawling, AI training, live retrieval when a user asks a question, and answer features that quote or summarize a page. Different platforms expose different controls for those activities. Different publishers have different reasons to allow or block them.

OpenAI’s bot documentation is one example of this split. It distinguishes between bots used for search, user-initiated fetches, and training-related crawling. Google uses its own Search controls and documentation. Other AI systems have their own rules, programs, and crawler behavior.

The uncomfortable question is whether those controls are respected in ways publishers can verify. Cloudflare has publicly alleged that Perplexity used undeclared crawlers to get around no-crawl directives in some cases, in a post titled Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives. That is Cloudflare’s allegation, not a court ruling. Perplexity has disputed related accusations in public responses. Still, the debate itself matters because AI visibility depends on trust in the fetching layer, not only trust in the final answer.

For a brand, the practical issue is simpler. If important pages are blocked by accident, AI systems may miss the best evidence about you. If important third-party sources block or restrict access, AI systems may lean on weaker sources. If crawler identity is hard to verify, publishers have less reason to trust the exchange.

Licensing can change which sources become visible

Licensing is often discussed as a copyright or revenue story. It is also a visibility story.

If an AI platform has licensed access to one set of sources and limited access to another, the answer engine’s view of a market can shift. That does not mean a licensed source automatically ranks or gets cited. It does mean availability, freshness, permissions, and product integration can become part of the source environment around an answer.

This is where AI search starts to feel different from old SEO. Classic search visibility was shaped by crawling, indexing, ranking, links, relevance, and many other signals. AI search adds another practical layer: the assistant may be combining indexed knowledge, live retrieval, partner data, citations, summaries, and policy rules inside one answer experience.

From the outside, we usually cannot see the full recipe. We can see the output. We can see citations when they are shown. We can test whether our own pages are reachable. We can watch whether certain publishers, forums, review sites, or documentation pages appear repeatedly for the questions that matter in our category.

That is enough to make better decisions than guessing.

Citations are clues, not proof of a healthy source economy

A citation in an AI answer can do useful work. It gives the user a place to check the claim. It gives the source visible credit. It gives a brand or publisher a clue about which pages influenced the answer.

But a citation does not settle the economics. A page can be cited without receiving meaningful traffic. A source can influence an answer without being named. A cited article can support one sentence while the rest of the answer comes from somewhere else. A weak source can make an answer look more grounded than it is.

That is why the best AI visibility work looks past the presence of a citation and asks better questions: does the source support the claim, is it current, is it accessible, does it have a reason to keep publishing, and does the answer change when that source disappears or gets replaced?

A recent arXiv paper on Google AI Overviews is useful because it treats AI answers as measurable objects. The exact platform behavior will keep changing. The habit is the important part: track which sources appear, what claims they support, and how AI answer sources differ from classic organic winners.

That last comparison is where a lot of teams get surprised. The source that ranks well in classic search is not always the source that gets used, cited, or surfaced in an AI answer. Sometimes the AI answer wants a more direct explanation. Sometimes it wants a comparison. Sometimes it wants a third-party source. Sometimes it uses whatever accessible source best fits the answer it is assembling.

The visibility risk is not only losing a citation

The shallow version of this topic is: publishers are angry, AI companies need content, brands should track citations. That is true enough, but it is not the whole problem.

The deeper visibility risk is that the source map around a category can change without the brand changing anything.

A review site can block a crawler. A publisher can move behind a licensing deal. A forum can become the dominant source for lived experience. A vendor’s documentation can become the clearest available explanation. A competitor can earn repeated mentions because third-party sources describe them more clearly than they describe you.

For SaaS teams, that can change which comparison pages, review sites, and documentation pages shape AI recommendations. For ecommerce brands, it can change whether answers lean on editorial reviews, marketplace pages, Reddit threads, retailer pages, or brand-owned content. For agencies, it changes the client conversation from “you need more content” to “the sources AI trusts in this category are not the ones you expected.”

For publishers, the question is even sharper. If reporting, testing, reviewing, or explaining products mainly creates raw material for someone else’s answer interface, then citations alone may not be a strong enough incentive. That is why licensing and traffic are part of the trust problem, not a separate media-industry subplot.

How to evaluate source trust in AI search

The useful response is not panic. It is a source map.

Pick the questions buyers actually ask before choosing a product, agency, tool, platform, or service in your category. Run those questions across the AI systems your audience is likely to use. For each answer, record the brands named, the sources cited, the claims those sources support, and whether the cited sources are reachable and current.

Then separate the problems.

  • If AI cites old sources, you may have a freshness problem.
  • If AI cites competitors but not you, you may have a third-party evidence problem.
  • If AI cites weak pages, the category may have a source-quality problem.
  • If your own pages are missing, check whether they are crawlable, specific, and useful enough to answer the prompt.
  • If the source set changes suddenly, look for access changes, new publisher deals, crawler blocking, or a shift in the answer format.

This is close to a citation audit, but the goal is different. A citation audit checks whether a particular answer linked to a particular page. A source map asks whether the answer system has enough trustworthy, accessible, current material to build a good answer around your market.

That difference changes the work. You may need to improve your own page. You may need clearer third-party proof. You may need better documentation. You may need to earn mentions in sources that AI systems already trust. You may need to stop assuming that a Google ranking guarantees an AI citation.

Where to start

Start with access, because it is the easiest thing to rule in or out.

Check whether your important pages are crawlable. Look at your robots rules. Make sure the pages that explain your category, product, comparisons, pricing, limitations, and proof are not accidentally hidden from the systems that might fetch them. If you want a quick check, our robots.txt checker is built for that first pass.

Then look beyond your own site. Which publishers, communities, directories, review sites, and documentation pages keep showing up around your category? Are they credible? Are they current? Do they explain your brand accurately? Do they have any reason to keep participating in AI answer surfaces?

The open web is still the answer layer for a lot of AI search. The fragile part is that the web has to stay worth publishing to. If the best sources pull back, get locked behind deals, block crawlers, or stop investing in useful pages, AI answers do not magically become better. They look elsewhere.

That is why AI search optimization has to include source trust. Your visibility depends on what AI systems say about you, but also on whether the web around your category remains accessible, credible, and worth citing.