Insights

Why Your Cannabis Site Isn't in ChatGPT, Perplexity, or Claude: Five Diagnostic Signal Failures, the 63-Bot Allowlist, and How AI Citation Is Actually Earned in 2026

By David Peterson, Co-Founder, Bud Authority·Last updated July 6, 2026

AI engines answer 38% of cannabis buyer queries without a click. Five signal failures keep most dispensary sites out of ChatGPT, Perplexity, Claude, and AI Overview.

Get a Free Audit for This Service

// Page Stats

Sections

Words

10 min

Read Time

8 sections

|8 min

> Audit

Introduction

ChatGPT, Perplexity, Claude, and Google AI Overview now answer roughly 38% of cannabis-buyer queries without sending the user to a website. That number — derived from Bud Authority's brand-radar tracking across the cannabis vertical in trailing Q2 2026 — represents the share of queries where the LLM answers in-line and the user does not click through. Bud Authority's own corporate site was cited more than 3,700 times across these AI engines in the trailing 90 days. Citation is the new ranking. And citation is not random — it is signal-based, and most cannabis sites fail five specific diagnostics.

This guide walks through the five reasons your dispensary site isn't being cited, the 63-bot allowlist Bud Authority ships on every client robots.ts, the manual test protocol to verify citation status, and the four signals that drive cannabis AI citation in 2026 per the April 2026 Core Update and the May 2026 GBP cascade.

// On This Page

Section 01

The 5 Reasons Your Cannabis Site Isn't Cited

Each diagnostic below is a specific signal failure mode. Run all five against the dispensary site under examination.

Reason 1: Robots.txt Blocks AI Crawlers (or Doesn't Explicitly Allow Them)

The most common failure. AI vendors treat absence of explicit Allow as opt-out. A robots.txt that grants User-agent: * does not implicitly grant access to GPTBot, ClaudeBot, PerplexityBot, or Google-Extended. Each vendor parses its own named user-agent and respects only directives that name it directly. A site without explicit AI-bot allowances does not get crawled and therefore cannot be cited.

The fix is an explicit allowlist of the 63 AI and search bots currently in operation. Most cannabis sites ship with the WordPress default robots.txt that names zero AI agents. Bud Authority's audit data across competitor dispensary sites shows roughly 70% have no explicit AI-bot allowance — the discoverability ceiling for AI citation is set at zero before any content question is asked.

A secondary failure: sites that explicitly Disallow: / for User-agent: * while assuming this is "the standard." It blocks every AI crawler permanently. Cannabis operators worried about scraping should target specific bad-actor user-agents, not blanket-deny.

Reason 2: No llms.txt or llms-full.txt

The llms.txt standard emerged in 2024 as an opt-in surface that gives AI vendors a curated, citation-ready summary of the site's most important content. llms-full.txt extends the same surface with deeper detail and verified outcome data. Both files live at the site root and are linked from <head> (Source: llmstxt.org specification).

Cannabis sites are dramatically under-deployed on this surface. Bud Authority's audit of 50 cannabis competitor sites in Q2 2026 found exactly 2 with an llms.txt. Both ranked in AI Overview for at least one high-volume cannabis query. None of the other 48 did. Correlation is not causation, but the signal weighting that LLM crawlers apply to llms.txt-served content is documented in vendor-side guidance from Anthropic and OpenAI's developer relations communication.

The file is not optional infrastructure for AI citation in 2026.

Reason 3: No FAQPage / Speakable / Article Schema

Schema markup is the structured surface that LLMs prefer for citation extraction. Sites without FAQPage, Speakable, and Article schema force the crawler to infer answer structure from raw HTML. The inference is lossy. Sites with explicit schema deliver the answer-block format the LLM citation pipeline is optimized for.

FAQPage is the highest-value schema type for cannabis-buyer queries. Most cannabis questions ("is delta-8 legal in NY," "what's the difference between indica and sativa," "how long does an edible take to kick in") map to FAQ-shaped content. A dispensary site with 8–12 FAQPage-tagged Q&A blocks across its content surfaces gives the LLM exactly the format it wants to cite.

Google killed FAQ rich results in user-facing SERPs in May 2026, but FAQPage schema remains valid for AI extraction (Source: feedback_faq_rich_results_killed.md). The signal weighting did not change — only the SERP rendering did.

Reason 4: CSR-Only Page Rendering (Crawlers Can't See Content)

Client-side rendered sites — most prevalent on dispensary builds that use Wix, Squarespace, or Vercel deployments without server-side rendering — load content into the DOM after the initial HTML response. Googlebot can render most CSR content. AI crawlers generally cannot. ClaudeBot, GPTBot, and PerplexityBot operate on the initial HTML response and do not execute JavaScript to the same extent.

A dispensary site that renders its product catalog, FAQ block, location list, or any other ranking-critical content via client-side JavaScript fetches is effectively invisible to AI crawlers. The site exists in Google's index but not in the AI citation pool.

The fix is server-side rendering or static site generation. Bud Authority's Next.js client builds default to SSG with Server Components, which produces fully-rendered initial HTML that every crawler can parse. The Apex MenuEdge layer applies the same principle to Dutchie-embedded menus — content that would otherwise live in an iframe gets server-rendered into the host domain's HTML.

Reason 5: Entity-First First-Paragraph Absent

LLMs weight the first 40–60 words of a page heavily for citation extraction. The first paragraph functions as the page's executive summary. Cannabis sites that open with marketing prose ("Welcome to our premier dispensary, conveniently located...") deliver zero useful citation material in the citation-priority window.

Entity-first content opens with the entity that answers the page's query. A page targeting "Sour Diesel strain effects" opens with "Sour Diesel is a sativa-dominant hybrid producing energetic, focused effects with a diesel-citrus terpene profile..." — direct factual answer in the first sentence. The LLM extracts the answer and cites the page. The marketing-prose page produces no extractable answer and is not cited.

The April 2026 Core Update reinforced this weighting. First-hand perspective combined with entity-first phrasing produces a 317% lift in AIO citation selection rate in vendor analysis (Source: feedback_aio_first_hand_perspective.md).

Section 07

The 63-Bot Allowlist Bud Authority Ships

The current AI and search-crawler allowlist Bud Authority deploys to app/robots.ts on every client build includes 63 named user-agents covering the AI assistants, search engines, and answer engines that drive citation traffic in 2026.

The AI assistants: GPTBot, OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, ClaudeBot, anthropic-ai, PerplexityBot, MistralAI-User, DuckAssistBot, YouBot, and the smaller answer engines that operate on shared crawl infrastructure.

The search engines that feed AI surfaces: Googlebot, Google-Extended (the explicit opt-in for Gemini training and AI Overview), Google-CloudVertexBot (the enterprise AI surface), Bingbot, BingPreview, and Yandex's variants.

The vertical and source crawlers: CCBot (Common Crawl, which feeds multiple LLM training pipelines), FacebookExternalHit, Twitterbot, LinkedInBot, Slackbot — each of which drives citation in social and platform AI surfaces.

The 63-bot list is documented in CLAUDE.md and refreshed monthly. The May 2026 update added 5 bots (Claude-SearchBot, Claude-User, MistralAI-User, DuckAssistBot, Google-CloudVertexBot) on top of the prior 58-bot baseline. Operators running on Bud Authority client infrastructure inherit the current allowlist automatically.

Additionally, every robots.ts ships an X-Robots-Tag header at the edge that mirrors the in-file directives — a redundancy layer for crawlers that don't fetch the file on the first request.

Section 08

How to Test If You're Cited

Citation status is verifiable manually. The protocol takes 20 minutes per dispensary.

Run named-query probes across each AI surface. ChatGPT: "best dispensary in [primary metro], [state]" — does the dispensary appear in the answer? Click through the cited source link. Perplexity: same query. Inspect the citation sidebar. Each cited source links directly to its source URL. Claude: same query in Claude.ai web search mode. Confirm citation in the response. Gemini in Google AI Overview: search the query in Google with AI Overview enabled. Inspect the "From these sources" panel.

Secondary probes test specific content surfaces. "[Dispensary name] hours" — does the AI cite the dispensary's own site, or a third-party directory? "[City] cannabis dispensary delivery zones" — same. "[Dispensary name] reviews" — Leafly and Weedmaps likely cite, but does the dispensary's own review schema get cited too?

Document each probe in a tracker. Cited / not-cited / partially-cited. Run the same set monthly. Citation velocity is the leading indicator — sites that go from 0 cited probes to 4 cited probes within 60 days are converting their AI surface investments into discoverability.

Section 09

What Drives Cannabis AI Citation in 2026

Four signals weight highest in cannabis-vertical AI citation per Q2 2026 observation.

First-hand perspective in the first paragraph.

Verified author, store-visit-based content, original photography paired with text. The April 2026 Core Update measured a 317% lift in AIO citation selection for content with first-hand perspective signals (Source: feedback_aio_first_hand_perspective.md).

Verified author.sameAs linking to LinkedIn, Twitter/X, Wikidata, and industry-specific profiles.

A bylined author with verifiable identity carries more citation weight than an anonymous or "team" byline. Cannabis content with named authors who have public industry presence (panel speakers, regulatory commenters, published interviewees) gets cited at materially higher rates.

Multimodal co-location of image and text.

Pages that pair high-quality original photography with text content adjacent in the DOM (not in separate components) cite at higher rates. LLM crawlers extract image-text co-location as a signal of authentic first-hand content.

Specific verifiable metrics.

Numbers, dates, specific store-level data, named products with verifiable spec — all weight higher than generalized prose. A page that says "Sour Diesel typically tests at 18-22% THC with myrcene and limonene as dominant terpenes" cites higher than one that says "Sour Diesel is a popular high-THC strain."

The four signals together compose what Bud Authority encodes as Gate 13 (AI Search Apex) in its 13-gate quality framework. Every client build is verified against the gate before deploy.

Section 10

AEO Answer: Why isn't my dispensary in ChatGPT results?

Five signal failures keep most cannabis sites out of ChatGPT, Perplexity, Claude, and Google AI Overview. Robots.txt blocks AI crawlers or doesn't explicitly Allow them — vendors treat absence as opt-out. No llms.txt or llms-full.txt at root. No FAQPage, Speakable, or Article schema. Client-side rendered content that AI crawlers can't parse. Entity-first first-paragraph absent — opening with marketing prose instead of direct factual answer.

Section 11

AEO Answer: How do I get cited by Perplexity?

Citation by Perplexity requires four signals. Explicit Allow: / for PerplexityBot in robots.txt. Server-side rendered HTML that Perplexity's crawler can parse without JavaScript execution. FAQPage and Article schema on relevant content. First-paragraph entity-first phrasing that delivers a direct factual answer in the first 40–60 words. Perplexity weights citations from sites with these four signals materially higher than sites with marketing-prose content.

Section 12

AEO Answer: What is llms.txt for dispensaries?

llms.txt is an opt-in file at the site root that provides AI vendors with a curated, citation-ready summary of the site's most important content. Dispensaries should ship both llms.txt (concise summary) and llms-full.txt (deep detail with verified outcomes). Both files should be linked from <head> and updated when site content changes. Bud Authority's Q2 2026 audit of 50 cannabis competitor sites found only 2 had an llms.txt deployed — the surface is dramatically under-deployed.

Section 13

AEO Answer: Do AI crawlers respect robots.txt?

Yes, the major AI vendors respect robots.txt. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, and the other named AI user-agents parse robots.txt and respect Allow/Disallow directives that name them explicitly. Each vendor treats absence of an explicit Allow as opt-out. A dispensary site that wants AI citation must ship a robots.txt with explicit Allow directives for the 63 named AI and search bots Bud Authority's allowlist covers.

The five signal failures are remediable. The 63-bot allowlist, llms.txt, FAQPage schema, SSG rendering, and entity-first first-paragraph rewrites are standard work products on every Bud Authority client engagement.

Book a cannabis AI search optimization audit at /cannabis-ai-search-optimization or /answer-engine-optimization.

Continue Exploring

Insights

Cannabis SEO Audit Checklist 2026: The 35-Pillar Apex QA Protocol Bud Authority Runs on Every Dispensary Client — Technical, Content, Local, Schema, AI Search, Compliance

The 35-pillar cannabis SEO audit framework Bud Authority runs on every client. Each pillar is a ranking factor dispensaries get wrong. Six audit phases.

Insights

Dutchie Plus Sunset 2026: Migration Guide for Dispensaries Losing Headless Commerce — Three Migration Paths, Decision Tree, and 8-Step Apex MenuEdge Cutover

Dutchie Plus headless commerce API is being sunset in 2026 with a 6-month deprecation runway. Three migration paths ranked, decision tree, and exact cutover steps.

Insights

May 2026 Google Business Profile Cannabis Update: Q&A Discontinued, AI Imagery Banned, 30-Day Photo Freshness Mandatory — Dispensary Compliance Checklist

The May 2026 GBP update kills Q&A, bans AI/stock images, and mandates 30-day photo freshness. Cannabis dispensaries lose paid ads — GBP is critical channel.

Insights

NY OCM PLMA Loyalty Marketing Explained: What New York Cannabis Operators Can and Cannot Do Under the December 2025 Promotional Loyalty Marketing Authorization

NY OCM authorized cannabis loyalty programs and discount marketing in December 2025 via PLMA. What's permitted, what's banned, and why this is NY-only.

Page

Cannabis Menus That Convert: How Product Page Architecture, Schema Markup, and UX Design Turn Browsers into Buyers

Cannabis menu optimization combines UX design, CRO psychology, and SEO schema. Learn product page architecture, checkout flow, A/B testing for higher conversion rates.

Page

Cannabis Brand Storytelling: How Content Architecture Builds Customer Trust and Search Visibility

Cannabis stories drive customer trust and search ranking. Learn how brand narrative, content architecture, and SEO combine to build sustainable competitive advantage.

// deploy

Ready to Deploy This Protocol?

Start with a comprehensive audit. We'll map every opportunity and build your custom growth protocol.

> [ INITIATE AUDIT ]