llms.txt Generator

Shipped

Solo Builder·Side Project·2025

Generate structured llms.txt files from any website. Enter a URL, get LLM-ready content. Open source.

Pages Crawled

Output Files

SSE

Streaming

Zero

Storage

Highlights

→Crawls up to 20 pages via Firecrawl and produces both llms.txt (concise index) and llms-full.txt (full markdown content) in a single run
→Real-time crawl progress streams to the browser via Server-Sent Events, rendered in a terminal-style UI so you can watch pages arrive as they are processed
→Zero server-side storage: the API key never leaves the request, output files are generated in memory and delivered directly as downloads

The Problem

The llms.txt standard gives AI crawlers a structured entry point into your site, but creating one manually means reading every page, writing summaries, and formatting them to spec. For a 20-page site that is two to three hours of work most developers skip entirely. Existing generators either cover only the homepage or produce unstructured dumps that defeat the purpose of the standard.

What I Built

Standards-Compliant Output

Generates both llms.txt (concise index: title, URL, description per page) and llms-full.txt (complete markdown body of every crawled page). Both files follow the emerging llms.txt community spec so AI crawlers and developer tools can parse them consistently.

Real-Time Crawl Streaming

Server-Sent Events stream progress back to the browser as pages are processed. A terminal-style log shows each URL, its status (crawled / skipped / failed), and the reason. Users watch the crawl happen in real time instead of staring at a spinner.

Zero Server-Side Storage

The Firecrawl API key is passed per-request and never persisted. Crawled content is held in memory only during processing. Output files are generated in memory and delivered directly as downloads. No database, no sessions, no logs. Privacy by architecture.

20-Page Site Walk

Respects internal link structure and crawls up to 20 pages starting from the homepage. Skips external links, duplicates, and non-HTML resources. Page descriptions are extracted from meta descriptions or first-paragraph heuristics. Content is cleaned: scripts, styles, and navigation stripped.

Tech Stack

All projects

React
TypeScript
Vite
Hono
Firecrawl
Tailwind CSS
shadcn/ui

Related Projects

FreeLLMOpenAI-compatible gateway that pools 6 free LLM tiers into one endpoint. Multi-key rotation, circuit breakers, response cache.View project Masari Employee PortalBilingual intranet for a Saudi government entity. 33 pages, 1,800+ translation keys, sole frontend dev.View project UnifyHQEnterprise facility management platform. 471 API endpoints, 194 pages, 8 languages. Built in 26 days.View project

Interested in working together?

Book a call