llm-wiki-compiler turns raw sources into a wiki

OraCore Editors

Back to home

[IND] June 14, 20266 min readOraCore Editors

llm-wiki-compiler turns raw sources into a wiki

8 features show how llm-wiki-compiler turns notes, docs, and papers into a linked wiki with citations and audit trails.

MCP server

Share LinkedIn

llm-wiki-compiler turns raw sources into a wiki

llm-wiki-compiler turns raw sources into an interlinked, cited wiki.

GitHub’s atomicstrata/llm-wiki-compiler packages a two-phase pipeline, a local viewer, and agent-friendly export paths for durable knowledge work. The repo has about 1.5k stars and 155 forks.

Item	Primary payoff	Notable detail
Compiled wiki	Structured output	Typed pages with citations
Hybrid retrieval	Better evidence selection	Embeddings, BM25, graph expansion
Local viewer	Browse and inspect	Search, graph, citation chips
Eval harness	Quality checks	Health score and regression deltas
MCP server	Agent access	Context packs for Claude Desktop and Cursor

1. Compiled wiki output

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

The core promise is simple: feed in raw material and get back a persistent wiki instead of a pile of loose chunks. The compiler turns sources into typed pages such as concept, entity, comparison, and overview, with paragraph- and claim-level citations tied back to source line ranges.

That matters when you want a reference artifact that can be read later without re-running the whole discovery step. The repo is inspired by Karpathy’s LLM Wiki pattern, but it adds explicit provenance and page typing so the output is easier to trust and reuse.

Inputs: notes, docs, papers, READMEs, ADRs
Outputs: interlinked markdown pages
Citations: source line ranges at claim level

2. Hybrid retrieval pipeline

Search is not just vector lookup. The project uses incremental, content-hash-aware embeddings to narrow the candidate set, then BM25 reranking and wikilink-graph expansion to assemble the final evidence pack.

That mix helps when a source base grows past a few files and recall starts to matter more than raw similarity. It also means the compiler can keep retrieval focused while still surfacing connected pages that a pure embedding pass might miss.

Semantic chunk embeddings for top-K narrowing
BM25 reranking for lexical precision
Graph expansion for linked context

3. Local web viewer

llmwiki view opens a read-only browser UI for inspecting the compiled wiki. You get sidebar navigation, search, a force-directed graph, and provenance chips on each page, which makes it easier to trace where a claim came from.

This is the part that turns the compiled output into something people can actually audit. Instead of reading raw logs or JSON, you browse the knowledge base the way you would browse a small internal encyclopedia.

Read-only browser interface
Sidebar navigation and search
Graph view plus citation chips

4. Eval harness and health checks

The repo includes llmwiki eval, which scores wiki health from 0 to 100 and reports citation coverage, precision, and regression deltas. It can also use LLM-as-judge support scoring and threshold checks that fit into CI.

For teams, this is a practical guardrail. You can tell whether a new ingest improved the wiki, broke citations, or quietly degraded quality before the changes spread into downstream use.

llmwiki eval --threshold 85 --judge --json

5. Freshness, rollback, and audit history

Knowledge bases rot when sources move. This project tracks stale and orphaned pages, supports targeted repair with llmwiki refresh --stale, and writes a durable operation log so each ingest, compile, and query is recorded.

It also adds rollback and diff-oriented reports, which is useful when you need to reverse a bad ingest or explain why a page changed. The repo even caches the latest lint summary in .llmwiki/last-lint.json so viewer health can show recent results without rerunning lint.

Stale-claim checks and freshness reports
Reverse ingest and compile diff reports
Timestamped log.md audit trail

6. MCP server and in-process SDK

llmwiki serve exposes the pipeline through MCP, so tools like Claude Desktop, Cursor, and Claude Code can ask for budgeted, citation-aware context packs. That makes the compiler usable as an agent memory layer instead of only a standalone CLI.

For developers who want direct integration, createWiki({ root }) runs ingest, compile, query, status, freshness, export, and eval inside your own process. That is a cleaner fit for custom tooling than shelling out on every step.

createWiki({ root }).query("what changed?")

7. Provider support and export paths

The tool is built to work across multiple model backends, including Anthropic, the Claude Agent SDK, OpenAI-compatible servers, Ollama, and GitHub Copilot. It also exports typed JSON envelopes that can be imported into @atomicmemory/llmwiki as verbatim Atomic Memory records.

That portability matters if your team mixes hosted APIs with local models. You can keep the same workflow while changing only the provider settings, which lowers the cost of trying the project in real environments.

Anthropic and Claude Agent SDK support
OpenAI-compatible local servers and Ollama
JSON export for runtime memory systems

How to decide

Pick this project if you want a source-to-wiki pipeline that keeps citations, supports agents, and leaves an audit trail. It fits researchers, technical writers, and maintainers who need durable reference material rather than a one-off summary.

If your main need is quick search over a few files, the full compiler may be more than you need. If you want a browsable knowledge base that can be refreshed, evaluated, and handed to agents, this repo is built for that job.

// Related Articles

llm-wiki-compiler turns raw sources into a wiki

1. Compiled wiki output

Get the latest AI news in your inbox

2. Hybrid retrieval pipeline

3. Local web viewer

4. Eval harness and health checks

5. Freshness, rollback, and audit history

6. MCP server and in-process SDK

7. Provider support and export paths

How to decide

OpenAI’s distillation playbook explains the Kimi panic

Gemini 3.5 Flash lets you buy speed

RISC-V is past the hobby phase and should be treated as a real platfo…

Nvidia and OpenAI discuss a $250B AI backstop

Claude Opus 5 posts the lowest safety audit score

Anthropic expands Claude partnership with Cognizant