Enhance RAG Accuracy with Corrective-RAG (CRAG)
NBD Lite #50 Implement self-reflection for the retrieved documents
All the code used here is present in the RAG-To-Know repository.
Retrieval-augmented generation (RAG) is a system that combines the data retrieval technique and LLM generation to produce an accurate response. Thus, RAG output will depend on the retrieval and generation parts.
RAG output accuracy depends on data quality, the retrieval module, and the generation model. Often, the crucial part that could make or break the RAG system is the retrieved document itself.
In the previous article, we discussed a few techniques that could enhance retrieval. We have previously discussed many techniques, which you can read below.
Although the retrieval process can still result in irrelevant or erroneous data, the Corrective-RAG (CRAG) framework emerged to help address these limitations.
Corrective-RAG introduces a mechanism for error detection and correction within the retrieval-augmented generation pipeline by identifying inaccurate retrieved documents.
This article will explore building a simple CRAG that evaluates the retrieved documents. The system will follow the diagram below, and the code is in the repository.
Introduction
As mentioned above, Corrective-RAG, or CRAG, is a framework for improving RAG results using error detection and correction steps. It was first introduced by Yan et al. (2024) paper, which explores where the lightweight retrieval evaluator can be used to assess the overall quality of retrieved documents and improve the robustness of generation.
Often, these steps are only performed within the retrieval steps to detect the error documents and perform correction steps. However, they can be extended to the generation step as well.
The technique is based upon a feedback loop that continuously evaluates the quality of retrieved documents and provides evaluation. Basically, we pass the document to the evaluator and perform a correction step if it doesn’t pass the evaluation.
The evaluation step can leverage advanced techniques such as confidence scoring, consistency checks, and contextual validation to detect potential errors.
Confidence scoring is a mechanism for evaluating the reliability or trustworthiness of a retrieved document or generated response by assigning a numerical score.
Consistency checks ensure that the retrieved information and the generated response are logically coherent and free from contradictions.
Contextual validation ensures that the retrieved information and generated response are accurate and contextually appropriate.
But how can we employ the evaluation technique above in the RAG system? There are many ways to do that, but one of the most used is the LLM-as-a-Judge evaluator.
The previous article taught us how the LLM-as-a-Judge works, but you can refresh the concept by reading the following article.
Of course, you can use a more rigid evaluation metric such as ROUGE, BLEU, or any score metrics that can be used. What is important in the CRAG framework is that there is an evaluation and a corrective step that we perform.
So, what are the benefits of using the CRAG framework? There are a few notable benefits, including but not limited to:
Improved Accuracy: By detecting and correcting errors in real time, CRAG significantly improves the factual accuracy of generated responses.
Better contextual response: CRAG ensures the generated responses are accurate and contextually appropriate.
Enhanced User Trust: By delivering more accurate, reliable, and contextually appropriate outputs, CRAG builds greater trust with users.
Those are some benefits you can expect by using the CRAG framework. Now, let’s try to build a simple CRAG framework.
Building CRAG
In the next step, we will see the CRAG pipeline that evaluates the context of the retrieved document and performs the correction step. Note that for simplicity purposes, we don’t use a feedback loop here to improve the retrieval result; instead, we perform the corrections step by searching the web.
We will use the one we performed in the following article to build the simple RAG system.
Let’s start building the CRAG pipeline by adding an evaluation step. In this step, I will only use a simple LLM-as-a-Judge to evaluate whether the document is relevant to the query. The output will be either “yes” or “no”.
def grade_document(query, document):
#Uses the Gemini model to decide if a document is relevant to the query.
prompt = f"""Query: {query}
Document: {document}
Is this document relevant to the query? Answer with "yes" or "no"."""
response = completion(
model="gemini/gemini-1.5-flash",
messages=[{"content": prompt, "role": "user"}],
api_key=GEMINI_API_KEY
)
answer = response['choices'][0]['message']['content'].strip().lower()
return "yes" if "yes" in answer else "no"
For the correction step, we will only use a simple web search to retrieve additional context using Internet data.
def supplementary_retrieval(query):
#Performs a web search using DuckDuckGo and returns the result as a string.
search_tool = DuckDuckGoSearchRun()
web_result = search_tool.invoke(query)
return web_result
Lastly, we build the CRAG pipeline using all the components we have previously constructed. We have built our simple CRAG pipeline by combining the semantic search, evaluator, and correction.
def corrective_rag(query, top_k=2):
# The main CRAG pipeline:
# 1. Retrieve documents using semantic search.
# 2. Grade each document using the evaluator.
# 3. If no relevant document is found, perform a web search.
# Step 1: Retrieve documents
results = semantic_search(query, top_k=top_k)
retrieved_docs = results.get("documents", [])
print("Initial retrieved documents:")
for doc in retrieved_docs:
print(doc)
# Step 2: Grade each document for relevance (Evaluation step)
relevant_docs = []
for doc in retrieved_docs:
grade = grade_document(query, doc)
print(f"Grading document (first 60 chars): {doc[:60]}... => {grade}")
if grade == "yes":
relevant_docs.append(doc)
# Step 3: If no relevant document is found, perform supplementary retrieval (Correction step)
if not relevant_docs:
print("No relevant documents found; performing supplementary retrieval via web search.")
supplementary_doc = supplementary_retrieval(query)
relevant_docs.append(supplementary_doc)
else:
print("Using relevant documents from the vector store.")
# Ensure all elements in relevant_docs are strings
context = "\n".join([" ".join(doc) if isinstance(doc, list) else doc for doc in relevant_docs])
# Step 4: Generate final answer using the combined context.
final_answer = generate_response(query, context)
return final_answer
You can then try to test the CRAG pipeline using the following code.
query = "What is the insurance for car?"
final_answer = corrective_rag(query)
print("Final Answer:")
print(final_answer)
The result will depend on the retrieved document and the generation model.
As mentioned, you can build the CRAG pipeline even further by adding a more strenuous feedback loop and applying it in the generation module.
That’s all for now! I hope you all learn more about the CRAG framework.
Is there anything else you’d like to discuss? Let’s dive into it together!
👇👇👇