Sphere Partners

TL;DR — Key Takeaways

Only 5% of AI projects reach production with sustained value and 42% of companies have abandoned most AI initiatives (McKinsey, 2025). 47% of enterprise AI users have made a major business decision based on hallucinated content (Kamiwaza, 2024). The failure is not model capability — it is missing organisational context. RAG improves accuracy by 60% relative on complex enterprise queries (Microsoft Research). Persistent memory closes the remaining gap for knowledge that was never documented.

of enterprise AI projects reach production with sustained value (McKinsey, 2025)

47%

of enterprise AI users have made a major business decision based on hallucinated content (Kamiwaza, 2024)

60%

relative accuracy improvement from RAG on complex enterprise queries (Microsoft Research)

42%

of institutional knowledge resides solely with individual employees — never documented

Six months after an enterprise AI deployment, the same conversation happens in organisations everywhere. The platform works — employees are using it, outputs are fluent and confident, the demo looked great. But the questions that actually matter at work are not getting answered well. What is our standard renewal process for EMEA clients? Which version of the product is in the regulated market pilot? What did the compliance team decide about the data residency issue? The AI either guesses plausibly and incorrectly, or admits it does not know.

Both outcomes erode trust faster than not deploying the platform at all. McKinsey's 2025 State of AI report found that 42% of companies had abandoned most of their AI initiatives — up from 17% in 2024 — and only 5% of AI projects reached production with sustained value (McKinsey, May 2025). Among those that deployed, only 39% reported measurable EBIT impact at the enterprise level. These numbers describe a technology that is underperforming at scale, and the underperformance has a consistent underlying cause: the AI employees are using knows everything in general and nothing about the organisation specifically.

“The key insight behind RAG is that parametric memory alone — what the model learned during training — is insufficient for knowledge-intensive tasks. Explicitly retrieving relevant documents at inference time and conditioning the model's output on that retrieved content fundamentally changes what the system can answer reliably.”

Patrick Lewis, Ethan Perez, et al. — "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", Facebook AI Research / University College London, NeurIPS 2020. (arxiv.org/abs/2005.11401)

Why Does the Most Capable AI Model Fail on Internal Questions?

Every major language model — GPT-4o, Claude, Gemini, Llama — was trained on the largest assemblage of human text ever compiled. That training data includes most of what humanity has written and published: the history of contract law, the structure of GAAP accounting, the syntax of seventeen programming languages, and the text of the EU AI Act. The model can explain risk classification tiers, draft a marketing brief in three styles simultaneously, and summarise a hundred-page report in three minutes.

What it does not know, and cannot know, is anything that has never appeared in a public dataset. The specific regulatory interpretation your legal team settled on after eighteen months of work with compliance counsel. The reason the architecture is structured the way it is — not the structure itself, which might be documented, but the constraint that shaped it, resolved in a meeting four years ago. The internal shorthand your team uses that means something specific in your context and nothing elsewhere. The decision made last Tuesday that changed the direction of the product.

None of this information is in any training dataset. It cannot be retrieved through better prompt engineering or unlocked by switching to a more capable model. Research on knowledge cutoffs and LLM information boundaries confirms what practitioners already know from daily use: the model's knowledge ends where public information ends, and most of what matters inside an organisation is private by definition.

How Hallucinations Become Business Decisions

The failure mode is more damaging than simple ignorance. A model that does not know something can say so. A model that does not know something but has been trained to be helpful and comprehensive will often construct a plausible-sounding answer from general knowledge and present it with the same confident tone it uses for facts. A 2024 survey of enterprise AI users found that 47% had made at least one major business decision based on hallucinated content (Kamiwaza, 2024). The hallucinations were not obvious errors — they were answers that sounded like they could be right, in the context of organisations the model knew nothing about.

The practical consequence is that enterprise AI deployments get used for tasks where organisational specificity does not matter: generating first drafts of external communications, reformatting documents, summarising industry news. These are useful tasks, but they are not the tasks that would transform how organisations work. The transformative applications — answering questions about internal processes, surfacing relevant past decisions, connecting institutional knowledge to current problems — require context the baseline model simply does not have.

AI Accuracy on Enterprise Questions — By Context Architecture

Approximate accuracy on company-specific queries at each deployment tier. Sources: Microsoft Research, AWS, McKinsey 2025.

Baseline LLM — no context injection (general knowledge only)~50%

RAG-enabled deployment — retrieval from internal document library~80%

RAG + Persistent Memory — retrieval plus accumulated interaction context~90%+

Fix One: What Retrieval-Augmented Generation Actually Does

Retrieval-Augmented Generation was formally described in a 2020 paper by Lewis and colleagues at Facebook AI Research: the core idea was to pair the language model with a retrieval system — rather than answering from training alone, retrieve relevant documents first and inject them into the model's context for each query (Lewis et al., NeurIPS 2020). The approach improved accuracy on knowledge-intensive tasks significantly, setting state-of-the-art on multiple open-domain question-answering benchmarks at the time of publication.

Applied to enterprise AI, RAG works as follows: the organisation builds a knowledge base by uploading internal documents — policies, procedures, product specifications, client briefings, regulatory guidance, past decisions. Those documents are chunked and converted into vector embeddings. When an employee asks a question, the question is also converted to a vector, and the knowledge base is searched for segments with high semantic similarity. The most relevant segments are retrieved and injected into the model's context before inference. The model generates its answer grounded in those retrieved segments and cites the source documents.

The accuracy improvement is substantial and well-documented. Microsoft Research's benchmarking of GraphRAG found 80% accuracy on complex enterprise queries compared to approximately 50% for the same queries without retrieval — a 60% relative improvement (Microsoft Research, 2024). AWS research on hybrid search and re-ranking pipelines reported hallucination rate reductions of 40–60% compared to baseline inference. RAG also solves the training cutoff problem for documented information: your most recent regulatory guidance supersedes anything the model learned during training, because the answer comes from your documents, not from training data.

The detailed post on how RAG works in enterprise deployments covers the full five-stage retrieval pipeline — chunking, embedding, vector search, re-ranking, and context injection — and the confidence scoring that makes source citations useful rather than decorative.

What RAG Cannot Fix

RAG retrieves documents. It has no mechanism to surface knowledge that was never documented — and most organisational knowledge is not documented. Research across multiple knowledge management studies finds that approximately 42% of institutional knowledge resides solely with individual employees — not in any system, document, or accessible format. The same research estimates that large companies lose an average of $47 million per year in productivity from inefficient knowledge sharing, not counting the cost of decisions made with incomplete context.

Even documented knowledge has a staleness problem. The policy from 2022 indexed in your knowledge base may have been superseded by a decision made last quarter that was communicated verbally and never formally updated. RAG will retrieve and surface that document accurately — but it cannot know the document is stale. There is also a structural gap in what organisations document at all: decisions get recorded in meeting notes if notes are taken, but the context behind the decision — why that choice, what constraints ruled out alternatives, what would trigger a revisit — almost never appears in any document.

Fix Two: What Persistent Memory Captures That Documents Cannot

MemGPT, published in 2023, demonstrated that a virtual context management system allowing the model to manage what information it holds in working memory versus external storage could maintain performance on tasks that exceeded the model's context window (Packer et al., arXiv 2023). More recently, Mem0 and similar production memory systems have demonstrated that structured, persistent memory architectures reduce latency and cost while maintaining or improving answer quality compared to full-context approaches.

The practical distinction matters. A knowledge base built from documents answers questions about things that were written down. A persistent memory system that continuously extracts structured information from interactions builds knowledge about things that were said, decided, inferred, and expressed in context — even if no one ever formally documented them. The two systems are complementary: RAG covers the documented fraction of organisational knowledge, persistent memory covers the undocumented remainder.

How Memory Maturity Keeps the Context Current

Memory Kind	What It Captures	Example	Default Lifetime
Facts	Persistent truths about the organisation	"The company operates across 14 markets"	Crystallised (permanent)
Entities	Clients, products, people, teams and their attributes	Client renewal date; product version in pilot	Consolidated (30 days+)
Decisions	Choices made, with rationale and who made them	"Legal decided on EU data residency for all enterprise tiers — Q3 2025"	Consolidated to Crystallised
Procedures	How things actually get done — workflows, approvals, escalation paths	"EMEA renewals require CCO sign-off above €50K"	Consolidated (30 days+)
Preferences	Individual working styles and communication habits	"Sarah prefers bullet-point summaries; Ahmed reviews drafts before sending"	Working (24h) → Consolidated
Context	Situational awareness relevant to current period	"Team is in budget freeze until Q3 2026"	Ephemeral (1h) → Working
Insights	Patterns surfaced from accumulated usage	"Sales team most commonly asks about competitor pricing on Wednesday afternoons"	Consolidated (30 days+)
Relationships	Dependencies and connections between entities	"Client X is connected to Partner Y through the regional distribution agreement"	Consolidated to Crystallised

Not everything extracted from every conversation deserves indefinite retention — that produces a memory store filled with outdated context that actively misleads rather than helps. Engram models four maturity stages: ephemeral memories (one-hour TTL) for passing references in a single conversation; working memories (24-hour TTL) for context recalled at least once; consolidated memories (30-day TTL) for information referenced repeatedly across multiple sessions; and crystallised memories — permanent foundational knowledge that survives staff turnover and remains relevant across years.

Promotion between stages happens automatically based on recall patterns. "Sarah prefers bullet-point summaries" starts as ephemeral context, promotes to working memory when referenced again, and crystallises once consistently observed across many sessions. Project-specific context from last quarter demotes naturally as it goes unreferenced. The system does not require manual curation to stay current — it ages out stale context the same way it ages up frequently recalled context.

Why Enterprise AI Abandonment Is a Context Problem, Not a Model Problem

The McKinsey finding that only 5% of AI projects reach production with sustained value reflects what happens when organisations deploy a baseline model and expect it to handle enterprise-specific queries. The accuracy gap between "performs well on general knowledge" and "performs well on questions about your organisation" is large enough to produce exactly the frustrated abandonment those numbers describe. Switching to a newer or larger model does not close the gap — the problem is the absence of organisational context, not insufficient general intelligence.

The compounding benefit of persistent memory is easy to underestimate. A knowledge base provides a floor — your documented knowledge, available from day one. Persistent memory improves continuously as the system accumulates interactions. Six months into a deployment with active memory, the AI's accuracy on internal questions has improved not because the model changed, but because the memory store has built up a rich map of the organisation's context, decisions, entities, and preferences.

The same compounding effect applies to staff transitions. When an employee leaves, their institutional knowledge typically leaves with them. With a persistent memory layer that has been crystallising their expressed knowledge across interactions, the core of that context remains accessible. A new employee's AI assistant starts with accumulated organisational intelligence from day one, rather than from scratch — producing measurable ramp-up time reductions for new hires.

The detailed post on Engram's memory architecture covers memory kinds, maturity stages, gravity wells, and cross-product scope in depth. For organisations assessing whether their current AI deployment is operating at the baseline tier or the full context tier, the answer is usually visible in the quality of responses to internal process questions — the gap between general-knowledge performance and internal-question performance is the measure of how much organisational context the system currently has access to.

Frequently Asked Questions

Large language models are trained on publicly available data only. They have no access to your organisation's internal decisions, processes, client history, or institutional knowledge — because none of that information has ever appeared in a public dataset. The gap cannot be closed through better prompting or by switching to a more capable model. It is a context problem: the information simply does not exist in training data, and the model cannot invent accurate company-specific answers from general knowledge.

RAG (Retrieval-Augmented Generation) pairs a language model with a vector search system over your internal documents. When an employee asks a question, the system retrieves the most semantically relevant document segments and injects them into the model's context before generating a response. Microsoft Research found 80% accuracy on complex enterprise queries with RAG versus approximately 50% without — a 60% relative improvement. AWS research on hybrid search found hallucination rate reductions of 40–60%.

Knowledge management research consistently finds that approximately 42% of institutional knowledge resides solely with individual employees — not in any document or accessible system. Large companies lose an average of $47 million per year in productivity from inefficient knowledge sharing. Even documented knowledge has a staleness problem: decisions communicated verbally and never formally updated will not be reflected in a knowledge base built from static uploads.

A knowledge base (RAG) answers questions from documents your organisation has uploaded — it retrieves documented knowledge. Persistent memory extracts and structures knowledge from the AI's interactions over time — capturing decisions made in conversation, preferences expressed by users, client relationship context, and procedures that were described but never formally written down. RAG covers the documented fraction; memory covers the undocumented remainder. Both are needed for full enterprise accuracy.

McKinsey's 2025 State of AI report found 42% of companies abandoned most of their AI initiatives — up from 17% in 2024 — and only 5% of projects reached production with sustained value. The consistent cause is deploying a baseline LLM without organisational context and expecting accurate responses on company-specific questions. The model performs well on general tasks, fails on internal ones, and the resulting trust erosion causes employees to stop using the platform for the high-value use cases it was deployed to serve.

No. The problem is not model capability — it is the absence of organisational context in what the model can access at inference time. A newer or larger model has more general world knowledge but no more knowledge about your organisation than any other model trained on public data. The fix is contextual: RAG to provide documented knowledge, persistent memory to provide undocumented interaction-derived knowledge. Model selection affects quality on general tasks; context architecture determines accuracy on internal questions.

Why Enterprise AI Gets Your Company-Specific Questions Wrong