What is RAG? Retrieval-Augmented Generation explained (with examples)

RAG, defined

RAG — short for Retrieval-Augmented Generation — is a technique that lets a large language model answer using your own documents instead of only its training data. Before generating a reply, the system retrieves the most relevant passages from a knowledge base and feeds them to the model as context, so the answer is grounded in your facts.

In short: retrieval finds the right information, and generation turns it into a fluent, accurate answer. The model contributes language skill; your data contributes the truth.

How RAG works, step by step

First, your documents are split into chunks and converted into numerical vectors (embeddings) stored in a vector database. When a user asks a question, the question is embedded too, and the system retrieves the chunks whose vectors are closest in meaning. Those chunks are then placed in the prompt alongside the question, and the model generates an answer from them.

The practical upside is twofold: answers stay current because you update the knowledge base rather than retrain the model, and you can cite the exact source passage — which builds trust and makes auditing possible.

RAG vs fine-tuning

Fine-tuning adjusts the model's internal weights to bake in a style or skill; RAG leaves the model untouched and supplies knowledge at query time. As a rule, use RAG when answers depend on facts that change or must be cited, and fine-tuning when you need a consistent tone, format or specialised behaviour. The two are complementary, not rivals.

For most businesses, RAG is the cheaper and faster starting point: it avoids the cost and data demands of training, and updating knowledge is as simple as editing a document.

Where RAG earns its keep

The strongest use cases are knowledge-heavy and high-volume: customer support that answers from your help centre and policies, internal assistants that search contracts and SOPs, and sales tools that pull live product specs. In each, the value is answering correctly from a source of truth rather than guessing.

Getting it right in production

A demo RAG system is easy; a reliable one is engineering. Retrieval quality, chunking strategy, handling of conflicting or stale documents, and guardrails against confident-but-wrong answers all decide whether users trust it. We build RAG with evaluation baked in — measuring answer accuracy and source attribution before anything ships.