SaaS & Code

Karpathy LLM Wiki Memory Method

By Sutopo

June 21, 2026 9 Min Read

How Karpathy’s LLM Wiki Turns Raw Sources into a Self-Maintaining Knowledge Base

TL;DR – Quick Summary

Karpathy’s LLM Wiki method replaces chunk-and-retrieve RAG with a three-layer system: raw sources, compiled wiki pages, and a schema document that governs the structure
The LLM acts as a knowledge compiler, transforming unstructured inputs into interlinked, structured wiki pages rather than searching over raw chunks
Queries hit the wiki layer first, producing coherent, context-rich answers instead of stitched retrieval fragments
A schema file defines naming conventions, page templates, and update rules, keeping the wiki self-consistent as new sources arrive
Several open source implementations now exist for teams ready to move past prototype-grade memory systems

🔊 Listen: LLM Wiki Method

12 min listen

Andrej Karpathy’s LLM Wiki memory method reframes how AI systems store and use knowledge. Instead of treating an LLM as a search engine over chunked documents, the method positions the model as a knowledge compiler: something that reads raw sources once, synthesizes their content, and writes structured wiki pages that persist across sessions. The result is a living knowledge base built and maintained by the LLM itself, answering queries from organized wiki entries rather than raw retrieval results. For practitioners who have watched RAG pipelines produce inconsistent outputs from duplicate chunks and disconnected context windows, the LLM Wiki pattern offers a more disciplined architecture, one that accumulates knowledge over time in a form the model can actually reason over.

Quick Takeaways

The LLM Wiki method gives AI systems persistent, organized memory without requiring a vector database or embedding pipeline
Raw sources stay immutable: the LLM writes to the wiki layer, never to the source files
A schema document is the control surface for the entire system, defining what gets a wiki page and how pages are structured
Query quality improves because the LLM reads coherent wiki pages instead of fragmented retrieval chunks

What Is Karpathy’s LLM Wiki Memory Method?

The method originated from Karpathy’s original gist, where he sketched a system for building and maintaining a personal wiki using an LLM as the primary writer and editor. The core idea is simple but consequential: raw information sources are never queried directly by the end user. Instead, an LLM processes those sources and compiles them into structured wiki pages. Those pages become the primary knowledge layer that answers questions, generates summaries, and surfaces connections between topics.

The method draws on a filing-cabinet analogy. Raw sources are the inbox, a pile of unprocessed documents, transcripts, notes, and articles. The wiki is the organized cabinet, where each drawer contains a well-labeled, cross-referenced page about a specific entity or topic. The LLM is the archivist, reading the inbox and maintaining the cabinet. Once the cabinet is stocked, you never rummage through the inbox again. You open the drawer you need.

The key distinction from a standard notes app with a search function lies in the role of the LLM during ingest. The model does not just index or tag documents. It reads them, identifies entities, merges overlapping information from multiple sources, writes prose summaries, and creates cross-links between related pages. Over time, the wiki becomes a condensed, coherent representation of everything in the raw sources, stripped of redundancy and formatted for fast comprehension.

The method applies to a wide range of use cases: research databases, personal knowledge management, customer support knowledge bases, codebases, and any domain where information accumulates faster than humans can organize it. An extended LLM Wiki v2 pattern, documented in this community gist, builds on the original by adding versioning hooks and conflict-resolution rules for high-volume ingest scenarios.

The Core Problem: Why Stateless RAG Falls Short

Standard retrieval-augmented generation works by embedding documents into chunks, storing those chunks in a vector index, and retrieving the top-k most similar chunks at query time. For simple factual lookups against a small, stable corpus, this works acceptably. For anything requiring synthesis, long-term memory, or multi-hop reasoning, it breaks down in predictable ways.

The first failure mode is context fragmentation. When a document is split into 512-token chunks, each chunk loses the surrounding context that makes it interpretable. A chunk containing “the threshold was set to 0.85” means nothing without the sentence explaining what metric, what model, and what experiment produced that number. The retriever finds the chunk, but the LLM receives an orphaned fact.

The second failure mode is knowledge duplication. If fifty meeting transcripts each mention the same project deadline in slightly different phrasing, all fifty chunks survive in the index. A query about that deadline retrieves several contradictory or redundant snippets. The LLM must reconcile them on the fly, often inconsistently across sessions.

The third failure mode is what practitioners increasingly call the stateless trap. Each RAG query starts from scratch. The system has no memory of what it answered yesterday, no ability to track how knowledge evolves, and no mechanism for noticing when a new document contradicts an old one. The index grows, quality degrades, and the system becomes harder to trust.

As one community analysis notes, LLM Wiki systems can also collapse without careful schema design, but the failure mode there is structural rather than fundamental. A poorly designed wiki produces orphaned pages and inconsistent naming. These are fixable. The stateless RAG failure is architectural: no amount of tuning adds a persistent, organized knowledge layer that was never built.

Inside the Three-Layer LLM Wiki Architecture

The architecture has three distinct layers, each with a specific role and a strict boundary against the others.

The first layer is the raw sources directory. This is an append-only store of original documents: PDFs, markdown notes, transcripts, URLs, code files, whatever the knowledge domain requires. Nothing in this layer is edited or deleted. It is the immutable truth layer. If a source is wrong, you add a correction document rather than modifying the original. This preserves a full audit trail and makes the wiki reproducible from scratch.

The second layer is the wiki directory. This is where the LLM writes. Each file is a markdown page about a specific entity or topic: a person, a project, a concept, a decision, a product. Pages follow a consistent template defined by the schema. They contain a summary, key facts, a list of related pages, and citations back to the raw sources that support each claim. The LLM creates, updates, and merges these pages as new sources arrive.

The third layer is the schema document. This is a plain-text or markdown file that tells the LLM the rules of the system: how to name pages, what sections each page type should contain, how to handle conflicts between sources, when to create a new page versus updating an existing one, and what the ingest and query workflows look like step by step.

💡 Pro Tip: Keep your schema document short and opinionated. A 1-page schema with clear naming conventions and three explicit workflow steps produces more consistent wikis than an elaborate specification trying to cover every edge case. LLMs follow tight constraints more reliably than comprehensive rules with dozens of exceptions.

The llmwiki open source package structures this three-layer model in Python, with separate directories for sources, wiki pages, and the schema config file. It offers a practical reference for teams building their first LLM Wiki system.

From Retrieval to Compilation: LLM as a Knowledge Compiler

The conceptual shift at the center of the LLM Wiki method is treating the model as a compiler, not a retriever. A compiler reads source code and produces an optimized binary. An LLM acting as a knowledge compiler reads raw sources and produces structured, queryable wiki pages. The output is qualitatively different from the input, not a rearrangement of the same text but a synthesized, organized representation of the knowledge the sources contain.

This matters for query quality. When a user asks a question against a standard RAG system, the model receives a context window full of retrieved chunks and must synthesize an answer in real time. The quality of that answer depends entirely on whether the right chunks were retrieved, whether they contain sufficient context, and whether they are internally consistent. All three conditions fail regularly in practice.

When a user asks the same question against an LLM Wiki, the model reads a wiki page that an earlier compilation pass already synthesized. The hard interpretive work happened at ingest time. The query-time model receives a clean, structured page and produces a coherent answer from it. Query latency drops, answer consistency improves, and the system degrades gracefully as the knowledge base grows, because new sources produce new or updated wiki pages rather than a larger pile of undifferentiated chunks.

The obsidian-wiki framework applies this compiler model inside an Obsidian vault, using an AI agent to run ingest and query workflows over a folder of notes. It demonstrates how the pattern maps onto existing knowledge management tools without requiring dedicated database infrastructure.

Key Workflows: Ingest, Query, and Maintain the Wiki

Three workflows keep an LLM Wiki functional over time: ingest, query, and maintain.

The ingest workflow runs whenever new sources arrive. The LLM reads each new document, identifies the entities and topics it covers, checks whether wiki pages already exist for those entities, and either creates new pages or updates existing ones. Each page update includes a citation to the source document. When a new source contradicts an existing wiki claim, the LLM notes the conflict explicitly rather than silently overwriting, a behavior enforced by the schema.

The query workflow is the user-facing path. A question arrives, the LLM identifies which wiki pages are most relevant, reads those pages in full, and formulates an answer with citations. For complex multi-hop questions, the LLM follows cross-links between wiki pages, traversing a small knowledge graph before answering. Raw sources are consulted only when a query requires detail that the wiki page summarized away.

The maintenance workflow handles wiki health over time: finding orphaned pages with no cross-links, identifying stale pages whose sources have been updated, merging near-duplicate pages covering overlapping topics, and running periodic schema-compliance checks. This workflow can run on a schedule or be triggered manually when the raw sources directory receives a batch of new documents.

💡 Pro Tip: During ingest, instruct the LLM to write a one-line confidence note at the bottom of each wiki page update, flagging whether the source was high-quality, contradictory, or thin on detail. This creates an audit trail that makes maintenance significantly easier when you revisit pages months later.

Projects like AutoSci extend these workflows into research automation, running ingest and synthesis loops over scientific literature to build cumulative knowledge bases capable of answering detailed domain questions. The core three-workflow pattern holds at any scale.

Frequently Asked Questions

Q: What problem does Karpathy’s LLM Wiki memory method solve?

It solves the stateless memory problem in standard RAG systems. Basic retrieval-augmented generation has no persistent, organized knowledge layer: each query starts fresh, knowledge duplicates across chunks, and conflicting information from multiple sources never gets reconciled. The LLM Wiki method replaces that with a compiled wiki the model maintains and queries against, giving AI systems consistent, cumulative memory across sessions.

Q: How does the three-layer LLM Wiki architecture (raw sources, wiki, schema) work?

Raw sources form an immutable, append-only store of original documents. The wiki layer is a directory of LLM-written markdown pages, one per entity or topic, containing summaries, key facts, and citations. The schema is a document defining the rules: how pages are named, what sections they contain, how conflicts are resolved, and what the ingest and query workflows look like step by step.

Q: Why is the LLM Wiki method better for long-term memory than basic RAG?

Because it does the hard interpretive work at ingest time rather than query time. Wiki pages are pre-synthesized, deduplicated, and cross-linked. A query reads a clean, structured page rather than a set of retrieved fragments. As the knowledge base grows, the wiki improves through maintenance workflows, whereas a RAG index degrades as chunk volume increases and duplicates accumulate.

Q: What are some open source tools that implement the Karpathy LLM Wiki pattern?

The llmwiki Python package implements the three-layer model directly. The obsidian-wiki framework applies the pattern inside an Obsidian vault using AI agents. The AutoSci project extends it to scientific literature synthesis. All three provide working ingest and query workflows you can adapt to your domain.

Q: How do you design a schema file for a Karpathy-style LLM Wiki?

Keep it short and specific. Define two to four page types such as entity, topic, event, and decision, each with a simple markdown template. Specify naming conventions, a rule for when to create a new page versus updating an existing one, how to handle source conflicts, and the step-by-step ingest and query workflows the LLM should follow. One well-scoped page produces more consistent results than a comprehensive specification the model will interpret loosely.

Table of Contents

Tags: