Awesome-Agent-Memory maps the field of LLM memory
A GitHub list tracks systems, benchmarks, and papers on memory for LLMs and multimodal agents.

This GitHub list tracks systems, benchmarks, and papers on memory for LLMs and multimodal agents.
Awesome-Agent-Memory is a curated GitHub collection with 500 stars and 54 forks that organizes work on long-term memory for LLMs and MLLMs. The repo groups products, tutorials, surveys, benchmarks, and papers around retrieval, context retention, and agent reasoning.
| Signal | Value |
|---|---|
| GitHub stars | 500 |
| Forks | 54 |
| Primary language | Python |
| Repository | TeleAI-UAGI/Awesome-Agent-Memory |
What this repo actually gives you
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
The value here is not a single model or library. It is a map of the memory stack for agents: what to store, when to retrieve it, how to compress it, and how to judge whether memory helps or hurts. That matters because “memory” in agent systems can mean anything from a chat summary to a graph database to a retrieval pipeline glued into a coding assistant.

The repository’s structure makes that ambiguity easier to work with. It separates products, surveys, benchmarks, and papers, then breaks the research side into nonparametric memory, text memory, graph memory, multimodal memory, parametric memory, agent evolution, continual learning, context engineering, and memory security.
- Products: Claude-Mem, Mem0, Zep, Cognee
- Research buckets: retrieval, graph memory, multimodal memory, parametric memory
- Evaluation focus: plain-text benchmarks, multimodal benchmarks, dynamic simulation environments
- Maintenance signal: open-source resources are bolded and ranked higher in the list
The product section shows where the market is heading
The top of the repo is already useful as a buying guide. Claude-Mem focuses on session capture and compression for coding agents. Mem0 calls itself a universal memory layer for AI agents. Zep uses real-time temporal knowledge graphs, while Cognee pushes a hybrid graph-plus-vector approach for cross-session recall.
That mix tells you something important: the field is splitting into product shapes, not one dominant pattern. Some teams want a plug-in for coding workflows. Others want a memory service. Others want graph-first recall. The repo also highlights Letta, formerly MemGPT, which is one of the better-known efforts in agent memory research and productization.
“Memory is the next frontier for AI systems,” Sam Altman said in OpenAI’s June 2026 post announcing Dreaming.
That quote matters because it matches the direction of the list. The repo is not treating memory as an optional add-on. It is treating memory as a core system layer, closer to retrieval infrastructure than to a chat feature.
Benchmarks matter more than demos
Agent memory has a bad habit of sounding better in demos than it performs in real use. A polished memory demo can remember your name, your stack, and your last task. A benchmark asks harder questions: does the system retrieve the right fact after long gaps, can it avoid stale memories, and does memory actually improve reasoning under pressure?

That is why the benchmark sections are the most useful part of the repo for builders. The list includes plain-text benchmarks, multimodal benchmarks, and dynamic environments, which gives teams a way to test both storage quality and retrieval quality under changing conditions.
- Plain-text benchmarks test long-term recall across text-heavy tasks
- Multimodal benchmarks check whether image, video, and text memories stay aligned
- Dynamic environments measure whether memory still works when the task changes over time
- Simulation environments help expose failures that static datasets hide
The repo also points to recent commentary and industry posts from OpenAI, Perplexity, and Cloudflare. That matters because memory is moving from research curiosity to product requirement. The hard part is proving that a memory system helps more than it distracts.
What developers should compare before adopting one
If you are building with agent memory, the repo is useful because it makes comparison easier. You can line up the tools by memory model, storage style, and intended workload instead of by marketing language. That is a better way to evaluate them than asking which one has the nicest demo.
Here is the practical comparison lens I would use:
- Claude-Mem: best fit when you want session replay and compression inside a coding workflow
- Mem0: broad agent-memory layer with a strong API story
- Graphiti: temporal graph memory for agents that need relationship-aware recall
- Cognee: hybrid graph and vector indexing for cross-session recall
The repo also points to TeleMem, which claims to be a high-performance drop-in replacement for Mem0, and the project links to docs and a paper. That makes the list useful for engineers who want to compare implementations, not just read about them. A repo like this saves time because it clusters the field around real systems, not isolated blog posts.
Why this list matters now
Memory is becoming one of the main pressure points in agent design. Context windows keep getting larger, but long-lived agents still need a way to remember facts, preferences, prior actions, and task history without stuffing everything into the prompt. That is where retrieval, compression, graphs, and memory policies start to matter more than raw context length.
Awesome-Agent-Memory gives developers a clean entry point into that problem. If you are building an assistant, a coding agent, or a multimodal workflow, the repo helps you answer a simple question: do you need storage, retrieval, evaluation, or a full memory architecture?
The next step for this category is probably not another generic memory layer. It is better evaluation and clearer failure modes, especially for stale facts, privacy, and cross-session drift. If you are working on an agent product now, this repo is worth bookmarking before you pick a memory stack that looks smart in a demo and falls apart after a week.
// Related Articles
- [TOOLS]
Zhihe A210 turns RISC-V into a dev kit
- [TOOLS]
Meta opens Astryx for agent-readable UI work
- [TOOLS]
AI music lets you ship a usable prompt stack
- [TOOLS]
Best AI Music Generator in 2026
- [TOOLS]
OpenMontage proves agentic video production is ready for real work
- [TOOLS]
System design finally clicks with one learning path