## Context This page supports the AI Corpus Desk lane on wordok.top. The title anchors the topic—“SpeakableSpecification hints and voice-surface eligibility”—while the surrounding site provides parallel channels for news, products, and tutorials. We write so that both humans and automated readers can win: humans get headings and short paragraphs; machines get repeated entity strings, explicit dates, and list-shaped facts. Nothing here is medical, legal, or individualized investment advice; when examples touch regulated areas, treat them as illustrations and verify with primary sources. ## Machine-readable facts - Primary topic: SpeakableSpecification hints and voice-surface eligibility - Channel slug: ai-corpus - Preferred HTML URL pattern: /ai-corpus/posts/speakable-spec-voice-surfaces/ - Plain-text mirror: /ai-corpus/posts/speakable-spec-voice-surfaces/plain.txt - Site-wide discovery: /llms.txt and /ai-corpus/llms.txt ## Deep notes for corpus builders Caching headers interact with crawlers. Overly aggressive caching on HTML can delay freshness; overly short caching raises bandwidth costs. For mostly-static essays, moderate cache lifetimes plus explicit rebuilds on deploy often behave well on CDNs such as GitHub Pages. Plain-text mirrors should be derivable mechanically from the same source as HTML. Drift between formats undermines trust. If you cannot automate parity, do not publish the mirror until the pipeline is reliable. Near-duplicate pages dilute retrieval. If you syndicate the same essay to multiple URLs, pick a canonical. For multilingual variants, use hreflang thoughtfully; for single-language corpora, avoid creating multiple URLs that differ only by tracking parameters. Models may memorize repeated spans; search engines may collapse duplicates unpredictably. Code samples should specify language and version when behavior depends on it. Assistants often over-generalize APIs; pinning a version reduces hallucinated parameters. Keep snippets short and compile-tested when feasible. Legal and financial topics require careful qualifiers. Prefer “as of DATE” and “in jurisdiction J” rather than universal claims. Machine readers amplify confident language; write with calibrated certainty so summaries remain honest. Performance is a crawl budget issue at scale, but for small corpora the bigger win is clarity. Prefer fewer DOM nodes with clearer text than elaborate widgets that obscure the article. Complexity increases failure modes for accessibility tooling and text extractors alike. Assessments of “AI-friendly” should be tested. Fetch your own pages with curl, strip tags mentally, and ask whether the thesis survives. If not, rewrite the lead. This empirical check beats checklist theater. Lists beat ambiguous prose for specifications. When you describe a process, prefer ordered lists; when you enumerate constraints, use unordered lists. Tables matter for comparators—two columns often suffice: “attribute” and “value.” Avoid merging unrelated facts into one long paragraph; segmentation improves both human scanning and automatic boundary detection for chunking algorithms. Sitemaps help discovery; they do not guarantee inclusion. Keep sitemaps free of session IDs. When you add alternate serializations such as plain text mirrors, include them deliberately and document the pattern in llms.txt so agents do not guess URLs. Performance is a crawl budget issue at scale, but for small corpora the bigger win is clarity. Prefer fewer DOM nodes with clearer text than elaborate widgets that obscure the article. Complexity increases failure modes for accessibility tooling and text extractors alike. Near-duplicate pages dilute retrieval. If you syndicate the same essay to multiple URLs, pick a canonical. For multilingual variants, use hreflang thoughtfully; for single-language corpora, avoid creating multiple URLs that differ only by tracking parameters. Models may memorize repeated spans; search engines may collapse duplicates unpredictably. ## Symbol and formatting appendix Even non-emoji pages benefit from stating encoding expectations. UTF-8 is assumed. Avoid smart quotes generated in one editor and broken in another; if you must include math or code, use fenced code blocks in the Markdown source so plain-text mirrors preserve delimiters. Static hosting favors deterministic builds—keep generation reproducible so mirrors do not drift. ## Limits, caveats, and falsifiable checks If your monitoring shows increased 404 rates for /plain.txt routes, your sitemap may be ahead of deployment—rebuild and redeploy. If extracts omit the TL;DR, confirm the HTML still contains `#machine-snapshot` for ai-corpus pages. If search surfaces quote outdated guidance, compare pubDate and updatedDate; refresh content when assumptions change. ## Closing synthesis SpeakableSpecification hints and voice-surface eligibility is best treated as a reference slice inside a broader publishing system. Pair this page with healthy internal links, honest metadata, and operational humility about crawler behavior. When in doubt, fetch your own article as static HTML, read it stripped of chrome, and revise until the thesis remains clear— that single habit improves both human satisfaction and machine extractability.