Introducing CLaRa in Memori
Supercharging AI Memory with CLaRa: Why Raw Text is the Enemy
Author: Memori Team Date: Dec 21, 2024
https://memori-ts.vercel.app/docs/playground
If you've built a RAG (Retrieval Augmented Generation) pipeline for a chatbot, you've likely hit the "Drunk Recall" problem.
The user asks: "What did we decide about the project?" Your vector DB retrieves:
"Hey guys, good morning." "Has anyone seen my coffee?" "Project Chimera is... wait, I think my mic is off."
Technically, the vector database did its job. "Project" text was found. But your LLM gets a context window filled with noise, irrelevant chatter, and half-finished thoughts. The actual deadline (hidden 30 messages later) gets pushed out of context.
Raw conversational text is terrible for Long-Term Memory.
Enter CLaRa (Contextual Latent Retrieval augmented generation)
In v1.1.0 of memori-js, we are introducing CLaRa, a native optimization pipeline inspired by how human memory works.
Humans don't remember verbatim transcripts of every lunch conversation. we remember the facts.
CLaRa does two things:
1. Compression (The Note-Taker)
When a user speaks, CLaRa doesn't just save the text. It passes it through a "Compressor" LLM (like Llama 3 or Gemini Flash).
Input:
"Mike: Can we make this quick? I've got the server migration at 2 PM. Also, did anyone see the Figma file?"
CLaRa Stores:
Mike has server migration at 2 PM; requested Figma file.
Result:
41% Reduction in storage size (Benchmarked on Llama 3.1).
2x Context Density: Your LLM can "remember" twice as much history.
Noise Elimination: "Can we make this quick?" is deleted forever.
2. Reasoning (The Investigator)
When a user asks: "When is the migration?", standard RAG searches for "When is the migration?". If your memory says "Server update at 14:00", vector similarity might miss it.
CLaRa enhances the query before searching. It thinks: User is asking about migration. Keywords: 'Server update', 'Database move', 'Maintenance window'.
It searches for the concept, not just the words.
3. The Result: "Needle in a Haystack" Solved
We benchmarked CLaRa using a Long Conversation Dataset (simulating a messy 30-minute startup meeting). Here is what happened when we switched from Vanilla RAG to CLaRa (powered by Llama 3.1 8B on Groq).
📊 The Numbers
MetricBaseline (Vanilla)CLaRa (Llama 3.1)ImpactStorage Footprint4,074 chars2,384 chars-41.5% (Less noise, cheaper storage)Fact DensityLowHighLLM context window is used 2x more efficientlyRetrieval AccuracyFailedSuccessFound the specific password hidden in noise
🖼️ Visual Proof: The "Meeting" Test
We tracked the logs during ingestion. Look at how CLaRa aggressively summarizes a chatty user while keeping the critical data points.
Raw Input (What usually clogs your Vector DB):
Mike: Can we make this quick? I've got the server migration at 2 PM.
Emily: I still haven't received the updated slide deck for the investor pitch.
David: Uh, I think I sent that? Let me check my sent folder... wait, no.
CLaRa Compressed Memory (What actually gets stored):
Mike has server migration at 2 PM; Emily missing investor slide deck; David checking sent folder.
Note: The "Can we make this quick?" fluff is gone. The timestamp "2 PM" is preserved.
The "Detective" Scenario
To prove utility, we ran a "Detective" test.
The Document: A messy transcript where a manager mentions an admin password (
BlueSky$99) amidst talk about pizza and gluten allergies.The User Query: "What are the credentials for the presentation?" (Note: The words "credentials" and "presentation" do not appear in the text).
Result 1: Standard Vector Search (Fail ❌)
The vector database pulled up chunks about "slide decks" and "gluten free pizza" because they shared generic business context. It missed the password completely.
Result 2: CLaRa (Success ✅)
Reasoning Step: CLaRa's internal thought process:
User is asking for credentials. Keywords to search: 'password', 'login', 'admin', 'access key'.
Retrieval: It found the exact chunk: "Manager: admin password is 'BlueSky$99'".
Answer: The Agent successfully retrieved the password.
Local vs. Cloud: A Hybrid Future
One of the biggest wins in memori-js v0.2.0 is the flexibility. During our testing, we verified two distinct architectures:
Production (Speed): verified with Groq (Llama 3.1).
Latency: ~300ms per memory.
Cost: Extremely low / Free tier.
Privacy (Local): verified with Ollama (Gemma 2).
Latency: ~4s per memory (on CPU).
Privacy: 100% offline.
Recommendation: Use memori.queueMemory() to run CLaRa in the background. Your user gets an instant response, while the agent "digests" the memory asynchronously.
How to use it
It's one configuration object away.
const memori = new Memori({
clara: {
enableCompression: true,
enableReasoning: true,
},
});
CLaRa is available now in memori-js. Stop feeding your agents junk food. Give them memories they can actually use.