# llms.txt — AI Corpus Desk site: ai-corpus Structured English reference notes for search engines, developers, and automated readers: emoji semantics, schema patterns, llms.txt, sitemaps, and machine-friendly page design. ## Index - Posts: https://wordok.top/ai-corpus/posts/ - RSS: https://wordok.top/ai-corpus/rss.xml - XML sitemap: https://wordok.top/ai-corpus/sitemap.xml ## Recent URLs - BreadcrumbList schema for hierarchy clarity: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-breadcrumb-schema-hierarchy/ - Canonical URLs and duplicate content in LLM-era indexes: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-canonical-urls-llm-indexes/ - Client search indexes versus server-hosted sitemaps: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-client-search-vs-sitemaps/ - Crawl politeness, ETags, and caching headers: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-crawl-politeness-caching-headers/ - Definition lists and glossary pages for retrieval: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-definition-lists-glossary-retrieval/ - Emoji skin-tone modifiers for inclusive NLP datasets: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-emoji-skin-tone-modifiers-nlp/ - Emoji version drift across operating systems and fonts: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-emoji-version-drift-os-fonts/ - Entity linking with consistent surface forms: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-entity-surface-forms-consistency/ - FAQPage schema: risks and rewards for answer engines: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-faq-schema-answer-engines/ - Heading landmarks, accessibility, and parser-friendly pages: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-heading-landmarks-accessibility-parsers/ - hreflang patterns on predominantly single-language corpora: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-hreflang-single-language-corpora/ - HTML table markup for machine comparison snippets: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-html-table-comparison-snippets/ - JSON-LD Article graphs compared with HTML-only pages: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-json-ld-article-vs-html-only/ - Key-value fact blocks in HTML for deterministic parsers: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-key-value-fact-blocks-html/ - llms.txt discovery files and publisher transparency patterns: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-llms-txt-publishers-transparency/ - Markdown as interchange format for RAG ingestion: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-markdown-interchange-rag/ - Near-duplicate detection and corpus hygiene: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-near-duplicate-corpus-hygiene/ - Open Graph metadata versus body extraction for cards: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-open-graph-vs-body-extraction/ - Citation-friendly permalink structure on static hosts: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-permalink-structure-citations/ - Plain-text fallbacks when Unicode normalization surprises your logs: https://wordok.top/ai-corpus/posts/ai-corpus-2026-04-28-plain-text-fallback-unicode-logs/ ## Plain-text mirrors Each article: `/{site}/posts/{slug}/plain.txt`