What is Retrieval-Augmented Generation (RAG)?

What is Retrieval-Augmented Generation (RAG)?

In this tutorial, we will learn about the popular technique of Retrieval-Augmented Generation (RAG) and how it works. Moreover, we will build our RAG chatbot using various LangChain components. 

Understanding Retrieval-Augmented Generation

RAG, or Retrieval Augmented Generation, is a process where the large language models (LLMs) become context-aware using external data. Instead of generating a general response, it will provide a highly accurate response related to your query. In short, it is a technique that combines the capabilities of a pre-trained large language model with an external data source.

RAG was developed to address the limitations of LLMs, such as reliance on outdated training data and the potential for generating inaccurate responses when faced with gaps in knowledge. By integrating retrieval-based techniques with generative-based AI models, RAG aims to provide context-aware answers, instructions, or explanations. 

RAG Overview

How Does RAG Work?

RAG is a multiple-step process where we have to first build the knowledge base and then combine it with the LLM API to generate context-aware responses. 

1. Loading the Data

Loading the data from a folder containing PDFs, DOCXs, and TXTs files, websites, relational databases, and more. 

2. Data chunking

Breaking your data down into smaller, more manageable pieces.

3. Document embeddings

Each chunk of data is then converted into a numerical representation known as an embedding. This is done using embedding models that transform the text into vectors.

4. Building Vector Database

The embeddings are stored in a vector database, which is designed for efficient similarity searches. This database acts as the knowledge base for the RAG system, allowing for quick retrieval of relevant information based on the user’s query.

5. User Query

When a user submits a query, it is first passed through an embedding model to convert it into an embedded query vector.

6. Vector Search

The embedded query vector is then used to search the vector database. The system retrieves the top-k relevant chunks from the knowledge base by measuring the distance between the query embedding and all the embedded chunks in the database.

7. Combining user query with retrieval data

The retrieved chunks are combined with the original user query to provide additional context. 

8. Generating Responses

This combined text is then passed to the LLM API to generate a response. The inclusion of relevant data from the knowledge base helps the LLM produce more accurate, reliable, and context-specific responses.

Getting Started with RAG using LangChain

In this guide, we will learn to build our first RAG application using the popular AI framework LangChain. 

Setting up

First, install all the necessary Python packages. 

pip install langchain
pip install -qU langchain-openai
pip install langchain_chroma
pip install langchainhub
pip install langchain_community

Set up an OpenAI API key to access the LLMs. In our case, we are providing the key to access the new GPT-4o model. 

import os
from langchain_openai import ChatOpenAI
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass()

llm = ChatOpenAI(model="gpt-4o")

Load all the necessary Python packages.

import bs4
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

Document Loader

We will load the KDnuggets tutorial that I recently published. It is about accessing SQL databases using Python. The WebBaseLoader will extract all the text from the web page and remove the HTML tags and other metadata. 

loader = WebBaseLoader("https://www.kdnuggets.com/using-sql-with-python-sqlalchemy-and-pandas")
docs = loader.load()
[Document(page_content='\n\n\n\n  Using SQL with Python: SQLAlchemy and Pandas - KDnuggets....

Splitting the Data

We will now convert one big chunk of text into smaller chunks. 

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

We currently have 11 chunks of text data.


Storing the Data in Vector Database

Convert the chunks into embeddings using the OpenAI embedding model and store them in the ChromaDB vector store.

vectorstore = Chroma.from_documents(

Building LangChain RAG Chain

We have everything ready to create the RAG chain. This chain will combine all the steps required to retrieve the data from the vector store using the query, combine it with the question, add the RAG prompt, and then pass everything to the LLM API. 

1. Convert our vector store client into a retriever so that when you pass the query, it automatically converts it into embedding, performs a similarity search, and, in response, provides similar text from the database. 

2. Pull the RAG prompt from the LangChain hub. This prompt is engineered to perform RAG-related queries. 

retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

3. Create the RAG chain using the retriever,  query, RAG prompt, LLM client, and output parser.

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()

This pipeline takes a user query and outputs the response in plain text, removing all metadata.

Invoking the RAG Chain

Let’s ask a few questions to test our RAG chain. We can start by asking about the author of the blog. 

rag_chain.invoke("What is the author of the blog?")

It is correct. 

'The author of the blog is Abid Ali Awan.'

Let’s ask a question to learn more about the blog. 

rag_chain.invoke("Why should I use SQL in Python?")

As we can see, It has responded accurately. 

'You should use SQL in Python to extract and manipulate data stored in relational databases efficiently. Combining SQL with Python using libraries like SQLAlchemy and Pandas allows you to perform detailed data analysis, data visualization, and even machine learning. This integration unlocks the full potential of your data by leveraging the strengths of both SQL for database operations and Python for advanced analytics.'


RAG is the backbone of modern AI, seamlessly integrating APIs, databases, and various other data sources with Large Language Models to deliver highly accurate and context-aware responses. By leveraging the strengths of both retrieval-based systems and generative models, RAG ensures that AI applications are not only more reliable but also more adaptable to diverse and dynamic information needs.

In this tutorial, we learned about Retrieval-Augmented Generation and how it works. We also learned how to build our own RAG-based chatbot using Langchain. If you enjoyed the AI content, please feel free to comment or reach out to me on LinkedIn. I’d love to share more knowledge with you on topics related to AI and LLMs.

Posted in AI

Leave a Reply

Your email address will not be published. Required fields are marked *