[IND] 6 min readOraCore Editors

llm-wiki-compiler turns raw sources into a wiki

8 features show how llm-wiki-compiler turns notes, docs, and papers into a linked wiki with citations and audit trails.

Share LinkedIn
llm-wiki-compiler turns raw sources into a wiki

llm-wiki-compiler turns raw sources into an interlinked, cited wiki.

GitHub’s atomicstrata/llm-wiki-compiler packages a two-phase pipeline, a local viewer, and agent-friendly export paths for durable knowledge work. The repo has about 1.5k stars and 155 forks.

ItemPrimary payoffNotable detail
Compiled wikiStructured outputTyped pages with citations
Hybrid retrievalBetter evidence selectionEmbeddings, BM25, graph expansion
Local viewerBrowse and inspectSearch, graph, citation chips
Eval harnessQuality checksHealth score and regression deltas
MCP serverAgent accessContext packs for Claude Desktop and Cursor

1. Compiled wiki output

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

The core promise is simple: feed in raw material and get back a persistent wiki instead of a pile of loose chunks. The compiler turns sources into typed pages such as concept, entity, comparison, and overview, with paragraph- and claim-level citations tied back to source line ranges.

llm-wiki-compiler turns raw sources into a wiki

That matters when you want a reference artifact that can be read later without re-running the whole discovery step. The repo is inspired by Karpathy’s LLM Wiki pattern, but it adds explicit provenance and page typing so the output is easier to trust and reuse.

  • Inputs: notes, docs, papers, READMEs, ADRs
  • Outputs: interlinked markdown pages
  • Citations: source line ranges at claim level

2. Hybrid retrieval pipeline

Search is not just vector lookup. The project uses incremental, content-hash-aware embeddings to narrow the candidate set, then BM25 reranking and wikilink-graph expansion to assemble the final evidence pack.

That mix helps when a source base grows past a few files and recall starts to matter more than raw similarity. It also means the compiler can keep retrieval focused while still surfacing connected pages that a pure embedding pass might miss.

  • Semantic chunk embeddings for top-K narrowing
  • BM25 reranking for lexical precision
  • Graph expansion for linked context

3. Local web viewer

llmwiki view opens a read-only browser UI for inspecting the compiled wiki. You get sidebar navigation, search, a force-directed graph, and provenance chips on each page, which makes it easier to trace where a claim came from.

llm-wiki-compiler turns raw sources into a wiki

This is the part that turns the compiled output into something people can actually audit. Instead of reading raw logs or JSON, you browse the knowledge base the way you would browse a small internal encyclopedia.

  • Read-only browser interface
  • Sidebar navigation and search
  • Graph view plus citation chips

4. Eval harness and health checks

The repo includes llmwiki eval, which scores wiki health from 0 to 100 and reports citation coverage, precision, and regression deltas. It can also use LLM-as-judge support scoring and threshold checks that fit into CI.

For teams, this is a practical guardrail. You can tell whether a new ingest improved the wiki, broke citations, or quietly degraded quality before the changes spread into downstream use.

llmwiki eval --threshold 85 --judge --json

5. Freshness, rollback, and audit history

Knowledge bases rot when sources move. This project tracks stale and orphaned pages, supports targeted repair with llmwiki refresh --stale, and writes a durable operation log so each ingest, compile, and query is recorded.

It also adds rollback and diff-oriented reports, which is useful when you need to reverse a bad ingest or explain why a page changed. The repo even caches the latest lint summary in .llmwiki/last-lint.json so viewer health can show recent results without rerunning lint.

  • Stale-claim checks and freshness reports
  • Reverse ingest and compile diff reports
  • Timestamped log.md audit trail

6. MCP server and in-process SDK

llmwiki serve exposes the pipeline through MCP, so tools like Claude Desktop, Cursor, and Claude Code can ask for budgeted, citation-aware context packs. That makes the compiler usable as an agent memory layer instead of only a standalone CLI.

For developers who want direct integration, createWiki({ root }) runs ingest, compile, query, status, freshness, export, and eval inside your own process. That is a cleaner fit for custom tooling than shelling out on every step.

createWiki({ root }).query("what changed?")

7. Provider support and export paths

The tool is built to work across multiple model backends, including Anthropic, the Claude Agent SDK, OpenAI-compatible servers, Ollama, and GitHub Copilot. It also exports typed JSON envelopes that can be imported into @atomicmemory/llmwiki as verbatim Atomic Memory records.

That portability matters if your team mixes hosted APIs with local models. You can keep the same workflow while changing only the provider settings, which lowers the cost of trying the project in real environments.

  • Anthropic and Claude Agent SDK support
  • OpenAI-compatible local servers and Ollama
  • JSON export for runtime memory systems

How to decide

Pick this project if you want a source-to-wiki pipeline that keeps citations, supports agents, and leaves an audit trail. It fits researchers, technical writers, and maintainers who need durable reference material rather than a one-off summary.

If your main need is quick search over a few files, the full compiler may be more than you need. If you want a browsable knowledge base that can be refreshed, evaluated, and handed to agents, this repo is built for that job.