Artificial Intelligence
9 minutes

Grounding AI: How To Stop Models From Making Things Up

Grounding stops AI from making things up by connecting models to verified, up-to-date sources at query time. Here's a plain-English guide to RAG, tool use, and why grounding matters most when the stakes are high.

When Smart Gets It Wrong

In 2023, a New York lawyer named Steven Schwartz submitted a legal brief that cited six court cases in support of his client's position. The judge asked for copies. Schwartz couldn't produce them - because they didn't exist. His AI assistant had invented them: real-sounding case names, plausible citations, convincing summaries. All fabricated.

What made the case particularly striking was Schwartz's explanation. He told the court he had assumed ChatGPT was a "super search engine" - that when it produced a case, it must have found it somewhere. "It just never occurred to me that it would be making up cases," he said. When he asked the chatbot directly whether one of the cited cases was real, it confirmed that it was. When he asked if any of the others were fake, it told him they were all real and could be found on reputable legal databases. They could not. Judge Kevin Castel of the Southern District of New York described the submissions as "bogus judicial decisions with bogus quotes and bogus internal citations." Schwartz and his firm were fined $5,000 and required to notify every judge who had been falsely named as the author of a fabricated ruling.

It's an extreme example, but the underlying behaviour is not rare. Ask a chatbot about a product feature that was quietly discontinued. Ask it for the latest regulatory guidance in a fast-moving area. Ask it to summarise a document it hasn't actually read. In each case, there's a reasonable chance you'll get a confident, fluent, well-structured answer - and a meaningful chance that answer is wrong.

This isn't a quirk that better AI will eventually fix. It's a consequence of how these systems work. And the solution - at least the most practical and widely-used one - is something called grounding.

So: if AI is so smart, why does it make things up? And what does grounding actually do about it?

What Is Grounding?

Grounding, in the context of AI, means connecting a model's outputs to real, verifiable, up-to-date sources - at the moment a question is asked, not at the moment the model was trained.

Think of it like the difference between two students sitting an open-book exam. The first relies entirely on memory. They're sharp, well-read, and their answer sounds authoritative. But if there's a gap in their knowledge, or if the textbook was updated since they last studied, they'll fill it in with something plausible. The second student has their notes in front of them. They might not know everything off the top of their head, but when they write something down, they can point to where it came from.

Grounding turns AI into the second student.

This works because of an important distinction that often gets overlooked: LLMs are significantly better at reading and extracting information from material placed in front of them than they are at recalling facts from memory. When you give a model a document and ask it a question about that document, you are shifting the task from generation - predicting what sounds right - to extraction - finding the answer in the source. That shift alone dramatically reduces the risk of hallucination. The model is no longer working from compressed, potentially outdated memory; it is working from the actual text. (For a deeper look at how generative AI models produce language, see our article Generative AI Models: How Modern AI Systems Produce Language and Content.)

It's worth being clear about what grounding is not, because there's a related concept that often gets conflated with it: fine-tuning. Fine-tuning means retraining a model on new data - actually changing the model's internal knowledge. It's expensive, slow, and doesn't help with anything that happens after the training run ends. Grounding, by contrast, doesn't change the model at all. It changes what the model can see when it's answering your question. The model stays the same; the context it works from changes dynamically, in real time.

That distinction matters enormously in practice. Grounding is what lets an AI answer questions about your internal documents, your current data, your live systems - without those things ever having been part of its training.

Why Models Hallucinate Without Grounding

To understand why grounding is necessary, it helps to have a rough mental model of what a large language model (LLM) is actually doing.

An LLM is not a search engine. It is not retrieving information from a database and presenting it to you. It is generating text - specifically, it is predicting, token by token (roughly word by word), what text is most likely to follow the prompt you gave it. It has learned to do this by processing enormous amounts of text during training, and it has encoded patterns from that text into billions of numerical parameters called weights.

This makes LLMs extraordinarily capable at reasoning, summarising, explaining, translating, and generating text in a given style. But it also means they have two structural weaknesses that don't go away regardless of how capable the model becomes.

The cutoff problem. Training data has a fixed end date. Anything that happened after that date simply isn't in the model's knowledge. Ask it about recent regulatory changes, last quarter's results, or a policy that was updated six months ago, and it won't know - but it may not say that clearly. It may confidently give you the old answer.

The confidence problem. Models don't know what they don't know. A human expert, when asked a question outside their knowledge, usually recognises the boundary and says so. An LLM will often generate a plausible-sounding answer regardless. This is sometimes called hallucination, though that term slightly obscures the mechanism: the model isn't "imagining" things in any experiential sense - it's generating statistically likely text, and in the absence of real information, likely text can be wrong text.

Grounding addresses both of these by giving the model real, current, specific information to work from when it generates a response.

The Main Grounding Techniques

There are several approaches to grounding, and in practice they're often combined.

Retrieval-Augmented Generation (RAG)

RAG is the most widely used grounding technique. The idea is simple: before the model generates a response, a retrieval system searches a knowledge base - your documents, your database, your internal files - and fetches the most relevant chunks of information. Those chunks are then injected into the prompt, so the model is generating its response based on real, specific context rather than memory alone.

In practice, the flow looks like this: a user asks a question; the system searches a knowledge base for the most relevant documents or passages; those passages are passed to the model alongside the original question; the model reads them and extracts an answer. The model is not being asked to remember - it is being asked to read. That is a fundamentally different and more reliable task. (For a detailed explanation of how retrieval systems and embeddings work under the hood, see our article How AI Finds Information: Retrieval Models, Embeddings and RAG Explained.)

Think of it like a researcher who, before answering your question, runs a quick search of their filing cabinet and reads the relevant pages first. They're still doing the thinking, but they're doing it with the right material in front of them.

RAG is particularly powerful because it can work with proprietary information - internal documents, client records, policy files - that was never part of the model's training data and never needs to be. It also means the knowledge base can be updated at any time without retraining the model.

Tool Use and Function Calling

This technique gives the model access to live tools - APIs, databases, web search, calculators, and so on - and allows it to call those tools as part of answering a question. Rather than relying on stored knowledge, the model can actively go and get current information.

For example, a model with access to a live financial data feed can retrieve today's exchange rates rather than guessing. A model connected to a company's CRM can look up an actual customer record rather than generating a plausible-sounding one.

This is a more powerful form of grounding, but also more complex to build and govern - because you're giving the model the ability to interact with live systems, not just read from a static document store.

Prompt Grounding

The simplest form of grounding: you paste the relevant information directly into the prompt before asking the question. "Here is the document. Based only on this document, please answer the following question."

This works well for one-off tasks and doesn't require any infrastructure. Its limitation is that there's only so much text you can fit into a single prompt (though the limits have expanded significantly as context windows have grown). It also requires a human to select the right information to include, which doesn't scale well.

Real World Examples

Rather than letting the model generate free-form text, structured output techniques constrain it to respond within a defined format - only using values from an approved list, only citing sources it has been given, only drawing conclusions supported by the provided evidence. This doesn't prevent hallucination outright, but it makes it easier to detect and audit, and it limits the scope of what the model can fabricate.  Some real world examples to illustrate this are set out below.

Enterprise knowledge bases

Many organisations are deploying AI assistants grounded in their internal documentation - HR policy handbooks, IT support guides, compliance manuals. The model doesn't need to know everything; it needs to accurately answer questions about the specific documents the organisation has given it access to. Without grounding, the same assistant would answer HR questions based on general knowledge of employment practice, which may not match the company's actual policies at all.

Customer support

A support chatbot grounded in a company's current product documentation, pricing, and known issues is far more useful - and far less dangerous - than one answering from general training. Without grounding, it might describe features that were deprecated, quote prices that have changed, or suggest workarounds that no longer apply.

Medical and clinical contexts

This is one of the highest-stakes areas. A model assisting with clinical decision support needs to be grounded in current clinical guidelines, drug interaction databases, and patient records - not in general medical knowledge from its training. The consequences of hallucinated dosages or outdated treatment protocols are serious enough that grounding here is less a best practice and more a prerequisite for safe use.

Audit

Audit is an environment where accuracy is essential. An auditor asking an AI assistant to help interpret a revenue recognition policy, assess the appropriateness of a management estimate, or identify disclosures required under a specific accounting standard needs answers grounded in the actual standards, the actual client documents, and the actual audit evidence gathered - not in the model's general understanding of how accounting tends to work.

An ungrounded model helping with audit work might give technically coherent answers that are wrong for the specific client, wrong for the current year's standards, inconsistent with the audit file, or not in line with the firm's internal policies. It won't flag the uncertainty. It will present its output with the same smooth confidence it brings to everything else.

Grounding in audit means connecting AI tools to the actual source material: the engagement-specific documents, the current versions of relevant standards (IFRS, ISA, UK GAAP), the prior year file, the client's accounting policies. When the model cites something, there should be a traceable source. When it doesn't have the information to answer reliably, it should say so - and a well-designed grounded system will surface that, because it will show what it retrieved and what it didn't find.

Investment banking operations

Operations teams in financial services work with high volumes of structured data, time-sensitive information, and processes where errors have direct financial consequences. AI tools in this environment need to be grounded in live systems - current trade data, settlement records, counterparty databases, real-time FX rates - rather than in general knowledge of how financial markets work.

An AI assistant helping an operations analyst investigate a failed settlement needs to see the actual transaction record, the actual counterparty details, the actual message traffic. A model working from general knowledge of settlement processes might give a plausible explanation that simply doesn't match what's in the system. The failure mode isn't dramatic - it's subtle, and that's what makes it dangerous. The answer sounds right.

Grounding in this context means integration with the actual data systems, with proper retrieval of the specific records relevant to the query, and with outputs that can be traced back to the source data.

Grounding in Agentic AI

Increasingly, AI is being deployed agentically: models that don't just answer but act - browsing the web, writing and executing code, sending messages, updating records, triggering workflows.

In agentic settings, grounding becomes even more critical, because the stakes of a wrong answer aren't just a misleading response - they're a real action taken in the world based on fabricated context.

Imagine an AI agent tasked with preparing a reconciliation report, identifying discrepancies, and flagging them for review. If it's working from real, grounded data - live records, actual transactions, verified figures - it can do that reliably. If it's working from its own interpolations about what the numbers probably look like, it might flag phantom discrepancies or miss real ones.

Multi-agent architectures - where multiple AI models work together, each handling part of a larger task - amplify this further. If one agent passes fabricated information to another, the error compounds through the chain. Grounding at each step, with clear provenance of where information came from, is what keeps these systems reliable.

Limitations and Honest Trade-offs

Grounding is powerful, but it's not magic, and it's worth being clear-eyed about its limitations.

Garbage in, garbage out. Grounding only works as well as the sources it retrieves from. If the knowledge base is out of date, inaccurate, or incomplete, a grounded model will confidently present wrong information - but it will now be yourwrong information rather than the model's invented version. Maintaining the quality of the knowledge base is as important as the grounding infrastructure itself.

Retrieval can surface the wrong thing. RAG systems retrieve the chunks of text that are most semantically similar to the query - but similar isn't always the same as relevant. A poorly designed retrieval step might fetch the wrong document, the wrong section, or an outdated version of a policy alongside a current one. The model will use what it's given.

Latency and cost. Retrieval adds steps to the process. For real-time applications, the additional round-trip to search a knowledge base before generating a response adds latency. For high-volume applications, it adds cost. These are engineering trade-offs, not dealbreakers - but they're real considerations in production systems.

Confidence is still imperfect. Even a grounded model can misread or misapply the sources it's given. Grounding reduces hallucination significantly - it doesn't eliminate it.

What Good Grounding Looks Like in Practice

A well-grounded AI system has a few characteristics that you can look for:

Source attribution. The model cites where its answers came from. Not vaguely ("based on my knowledge") but specifically - this clause from this document, this figure from this data source. If you can't trace an answer back to a source, you can't verify it.

Confidence signalling. A good grounded system tells you when it doesn't have the information to answer reliably. "Based on the documents provided, I can't find specific guidance on this point" is a more useful output than a confident wrong answer.

Human-in-the-loop checkpoints. In high-stakes workflows - particularly in regulated industries - grounded AI should be designed to support human review, not replace it. The output is a starting point for a professional judgment, not a substitute for one. The best implementations make it easy for a human to trace the AI's reasoning back to its sources and make an informed call.

So, What Should You Take From This?

Grounding isn't just a technical implementation detail for engineers to worry about. It's the mechanism by which AI becomes trustworthy enough to be genuinely useful in professional settings.

As AI tools proliferate in regulated industries - audit, financial services, legal, healthcare - grounding will increasingly become a compliance expectation rather than a design choice. Regulators and professional standards bodies will want to know: when this system produced an output, what was it based on? Can you show us?

Sources cited in this article are listed in the references section below.

Sources and further reading

The Schwartz case

Maruf, R. (2023, May 28). Lawyer apologizes for fake court citations from ChatGPT. CNN Business. https://www.cnn.com/2023/05/27/business/chat-gpt-avianca-mata-lawyers/index.html

Moran, L. (2023, May 30). Lawyer cites fake cases generated by ChatGPT in legal brief. Legal Dive. https://www.legaldive.com/news/chatgpt-fake-legal-cases-sanctions-generative-ai-steven-schwartz-openai/652731/

Crane-Newman, M. (2023, June 22). Judge sanctions lawyers for brief written by AI with fake citations. CNBC. https://www.cnbc.com/2023/06/22/judge-sanctions-lawyers-whose-ai-written-filing-contained-fake-citations.html

Courthouse News Service (2023, June 22). Sanctions ordered for lawyers who relied on ChatGPT to prepare court brief.https://www.courthousenews.com/sanctions-ordered-for-lawyers-who-relied-on-chatgpt-artificial-intelligence-to-prepare-court-brief/

The original RAG paper (referenced in the RAG section)

Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks.Advances in Neural Information Processing Systems (NeurIPS 2020). https://arxiv.org/abs/2005.11401

Hallucination in financial services (referenced in the financial services and limitations sections)

BizTech Magazine (2025, August). LLM hallucinations: What are the implications for financial institutions?https://biztechmagazine.com/article/2025/08/llm-hallucinations-what-are-implications-financial-institutions

Financial Stability Board (2025). Monitoring adoption of artificial intelligence and related vulnerabilities in financial services. https://www.fsb.org/uploads/P101025.pdf

Agentic AI (referenced in the agentic AI section)

Anthropic (2024). Building effective agents. https://www.anthropic.com/research/building-effective-agents

April 10, 2026

Read our latest

Blog posts