llms.txt Generator
ShippedGenerate structured llms.txt files from any website. Enter a URL, get LLM-ready content. Open source.
20
Pages Crawled
2
Output Files
SSE
Streaming
Zero
Storage
Highlights
- →Crawls up to 20 pages via Firecrawl and produces both llms.txt (concise index) and llms-full.txt (full markdown content) in a single run
- →Real-time crawl progress streams to the browser via Server-Sent Events, rendered in a terminal-style UI so you can watch pages arrive as they are processed
- →Zero server-side storage: the API key never leaves the request, output files are generated in memory and delivered directly as downloads
The Problem
The llms.txt standard gives AI crawlers a structured entry point into your site, but creating one manually means reading every page, writing summaries, and formatting them to spec. For a 20-page site that is two to three hours of work most developers skip entirely. Existing generators either cover only the homepage or produce unstructured dumps that defeat the purpose of the standard.
What I Built
Standards-Compliant Output
Generates both llms.txt (concise index: title, URL, description per page) and llms-full.txt (complete markdown body of every crawled page). Both files follow the emerging llms.txt community spec so AI crawlers and developer tools can parse them consistently.
Real-Time Crawl Streaming
Server-Sent Events stream progress back to the browser as pages are processed. A terminal-style log shows each URL, its status (crawled / skipped / failed), and the reason. Users watch the crawl happen in real time instead of staring at a spinner.
Zero Server-Side Storage
The Firecrawl API key is passed per-request and never persisted. Crawled content is held in memory only during processing. Output files are generated in memory and delivered directly as downloads. No database, no sessions, no logs. Privacy by architecture.
20-Page Site Walk
Respects internal link structure and crawls up to 20 pages starting from the homepage. Skips external links, duplicates, and non-HTML resources. Page descriptions are extracted from meta descriptions or first-paragraph heuristics. Content is cleaned: scripts, styles, and navigation stripped.
Tech Stack
All projects- React
- TypeScript
- Vite
- Hono
- Firecrawl
- Tailwind CSS
- shadcn/ui
Related Projects
Interested in working together?