llm-wiki-compiler turns raw sources into a wiki
8 features show how llm-wiki-compiler turns notes, docs, and papers into a linked wiki with citations and audit trails.

llm-wiki-compiler turns raw sources into an interlinked, cited wiki.
GitHub’s atomicstrata/llm-wiki-compiler packages a two-phase pipeline, a local viewer, and agent-friendly export paths for durable knowledge work. The repo has about 1.5k stars and 155 forks.
| Item | Primary payoff | Notable detail |
|---|---|---|
| Compiled wiki | Structured output | Typed pages with citations |
| Hybrid retrieval | Better evidence selection | Embeddings, BM25, graph expansion |
| Local viewer | Browse and inspect | Search, graph, citation chips |
| Eval harness | Quality checks | Health score and regression deltas |
| MCP server | Agent access | Context packs for Claude Desktop and Cursor |
1. Compiled wiki output
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
The core promise is simple: feed in raw material and get back a persistent wiki instead of a pile of loose chunks. The compiler turns sources into typed pages such as concept, entity, comparison, and overview, with paragraph- and claim-level citations tied back to source line ranges.

That matters when you want a reference artifact that can be read later without re-running the whole discovery step. The repo is inspired by Karpathy’s LLM Wiki pattern, but it adds explicit provenance and page typing so the output is easier to trust and reuse.
- Inputs: notes, docs, papers, READMEs, ADRs
- Outputs: interlinked markdown pages
- Citations: source line ranges at claim level
2. Hybrid retrieval pipeline
Search is not just vector lookup. The project uses incremental, content-hash-aware embeddings to narrow the candidate set, then BM25 reranking and wikilink-graph expansion to assemble the final evidence pack.
That mix helps when a source base grows past a few files and recall starts to matter more than raw similarity. It also means the compiler can keep retrieval focused while still surfacing connected pages that a pure embedding pass might miss.
- Semantic chunk embeddings for top-K narrowing
- BM25 reranking for lexical precision
- Graph expansion for linked context
3. Local web viewer
llmwiki view opens a read-only browser UI for inspecting the compiled wiki. You get sidebar navigation, search, a force-directed graph, and provenance chips on each page, which makes it easier to trace where a claim came from.

This is the part that turns the compiled output into something people can actually audit. Instead of reading raw logs or JSON, you browse the knowledge base the way you would browse a small internal encyclopedia.
- Read-only browser interface
- Sidebar navigation and search
- Graph view plus citation chips
4. Eval harness and health checks
The repo includes llmwiki eval, which scores wiki health from 0 to 100 and reports citation coverage, precision, and regression deltas. It can also use LLM-as-judge support scoring and threshold checks that fit into CI.
For teams, this is a practical guardrail. You can tell whether a new ingest improved the wiki, broke citations, or quietly degraded quality before the changes spread into downstream use.
llmwiki eval --threshold 85 --judge --json
5. Freshness, rollback, and audit history
Knowledge bases rot when sources move. This project tracks stale and orphaned pages, supports targeted repair with llmwiki refresh --stale, and writes a durable operation log so each ingest, compile, and query is recorded.
It also adds rollback and diff-oriented reports, which is useful when you need to reverse a bad ingest or explain why a page changed. The repo even caches the latest lint summary in .llmwiki/last-lint.json so viewer health can show recent results without rerunning lint.
- Stale-claim checks and freshness reports
- Reverse ingest and compile diff reports
- Timestamped log.md audit trail
6. MCP server and in-process SDK
llmwiki serve exposes the pipeline through MCP, so tools like Claude Desktop, Cursor, and Claude Code can ask for budgeted, citation-aware context packs. That makes the compiler usable as an agent memory layer instead of only a standalone CLI.
For developers who want direct integration, createWiki({ root }) runs ingest, compile, query, status, freshness, export, and eval inside your own process. That is a cleaner fit for custom tooling than shelling out on every step.
createWiki({ root }).query("what changed?")
7. Provider support and export paths
The tool is built to work across multiple model backends, including Anthropic, the Claude Agent SDK, OpenAI-compatible servers, Ollama, and GitHub Copilot. It also exports typed JSON envelopes that can be imported into @atomicmemory/llmwiki as verbatim Atomic Memory records.
That portability matters if your team mixes hosted APIs with local models. You can keep the same workflow while changing only the provider settings, which lowers the cost of trying the project in real environments.
- Anthropic and Claude Agent SDK support
- OpenAI-compatible local servers and Ollama
- JSON export for runtime memory systems
How to decide
Pick this project if you want a source-to-wiki pipeline that keeps citations, supports agents, and leaves an audit trail. It fits researchers, technical writers, and maintainers who need durable reference material rather than a one-off summary.
If your main need is quick search over a few files, the full compiler may be more than you need. If you want a browsable knowledge base that can be refreshed, evaluated, and handed to agents, this repo is built for that job.
// Related Articles
- [IND]
Jensen Huang’s LG deal spans five AI bets
- [IND]
Nvidia and SK Group expand AI ties into co-development
- [IND]
Python’s JIT future hangs on a new PEP
- [IND]
Ukraine’s AI war network points to faster combat
- [IND]
Anthropic’s governance debate is now a market story
- [IND]
Mastercard’s AI Payments Move Is A Solana Bull Case, Not A Hype Story