Saturday, May 17, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

A Guide on 12 Tuning Strategies for Production-Ready RAG Applications | by Leonie Monigatti | Dec, 2023

December 6, 2023
in AI Technology
Reading Time: 6 mins read
0 0
A A
0
Share on FacebookShare on Twitter


How to Improve the Performance of Your Retrieval-Augmented Generation (RAG) Pipeline with These “Hyperparameters” and Tuning Strategies

Tuning Strategies for Retrieval-Augmented Generation Applications

Data Science is an experimental science. It starts with the “No Free Lunch Theorem,” which states that there is no one-size-fits-all algorithm that works best for every problem. And it results in data scientists using experiment tracking systems to help them tune the hyperparameters of their Machine Learning (ML) projects to achieve the best performance.

This article looks at a Retrieval-Augmented Generation (RAG) pipeline through the eyes of a data scientist. It discusses potential “hyperparameters” you can experiment with to improve your RAG pipeline’s performance. Similar to experimentation in Deep Learning, where, e.g., data augmentation techniques are not a hyperparameter but a knob you can tune and experiment with, this article will also cover different strategies you can apply, which are not per se hyperparameters.

This article covers the following “hyperparameters” sorted by their relevant stage. In the ingestion stage of a RAG pipeline, you can achieve performance improvements by:

  • Data cleaning
  • Chunking
  • Embedding models
  • Metadata
  • Multi-indexing
  • Indexing algorithms

And in the inferencing stage (retrieval and generation), you can tune:

  • Query transformations
  • Retrieval parameters
  • Advanced retrieval strategies
  • Re-ranking models

Note that this article covers text-use cases of RAG. For multimodal RAG applications, different considerations may apply.

Ingestion Stage of a RAG Pipeline

The ingestion stage is a preparation step for building a RAG pipeline, similar to the data cleaning and preprocessing steps in an ML pipeline. Usually, the ingestion stage consists of the following steps:

  1. Collect data
  2. Chunk data
  3. Generate vector embeddings of chunks
  4. Store vector embeddings and chunks in a vector database

This section discusses impactful techniques and hyperparameters that you can apply and tune to improve the relevance of the retrieved contexts in the inferencing stage.

Data Cleaning

Like any Data Science pipeline, the quality of your data heavily impacts the outcome in your RAG pipeline [8, 9]. Before moving on to any of the following steps, ensure that your data meets the following criteria:

  • Clean: Apply at least some basic data cleaning techniques commonly used in Natural Language Processing, such as making sure all special characters are encoded correctly.
  • Correct: Make sure your information is consistent and factually accurate to avoid conflicting information confusing your LLM.

Chunking

Chunking your documents is an essential preparation step for your external knowledge source in a RAG pipeline that can impact the performance [1, 8, 9]. It is a technique to generate logically coherent snippets of information, usually by breaking up long documents into smaller sections (but it can also combine smaller snippets into coherent paragraphs).

One consideration you need to make is the choice of the chunking technique. For example, in LangChain, different text splitters split up documents by different logics, such as by characters, tokens, etc. This depends on the type of data you have. For example, you will need to use different chunking techniques if your input data is code vs. if it is a Markdown file.

The ideal length of your chunk (chunk_size) depends on your use case: If your use case is question answering, you may need shorter specific chunks, but if your use case is summarization, you may need longer chunks. Additionally, if a chunk is too short, it might not contain enough context. On the other hand, if a chunk is too long, it might contain too much irrelevant information.

Additionally, you will need to think about a “rolling window” between chunks (overlap) to introduce some additional context.

Embedding Models

Embedding models are at the core of your retrieval. The quality of your embeddings heavily impacts your retrieval results [1, 4]. Usually, the higher the dimensionality of the generated embeddings, the higher the precision of your embeddings.

For an idea of what alternative embedding models are available, you can look at the Massive Text Embedding Benchmark (MTEB) Leaderboard, which covers 164 text embedding models (at the time of this writing).

While you can use general-purpose embedding models out-of-the-box, it may make sense to fine-tune your embedding model to your specific use case in some cases to avoid out-of-domain issues later on [9]. According to experiments conducted by LlamaIndex, fine-tuning your embedding model can lead to a 5–10% performance increase in retrieval evaluation metrics [2].

Note that you cannot fine-tune all embedding models (e.g., OpenAI’s text-ebmedding-ada-002 can’t be fine-tuned at the moment).

Metadata

When you store vector embeddings in a vector database, some vector databases let you store them together with metadata (or data that is not vectorized). Annotating vector embeddings with metadata can be helpful for additional post-processing of the search results, such as metadata filtering [1, 3, 8, 9]. For example, you could add metadata, such as the date, chapter, or subchapter reference.

Multi-indexing

If the metadata is not sufficient enough to provide additional information to separate different types of context logically, you may want to experiment with multiple indexes [1, 9]. For example, you can use different indexes for different types of documents. Note that you will have to incorporate some index routing at retrieval time [1, 9]. If you are interested in a deeper dive into metadata and separate collections, you might want to learn more about the concept of native multi-tenancy.

Indexing Algorithms

To enable lightning-fast similarity search at scale, vector databases and vector indexing libraries use an Approximate Nearest Neighbor (ANN) search instead of a k-nearest neighbor (kNN) search. As the name suggests, ANN algorithms approximate the nearest neighbors and thus can be less precise than a kNN algorithm.

There are different ANN algorithms you could experiment with, such as Facebook Faiss (clustering), Spotify Annoy (trees), Google ScaNN (vector compression), and HNSWLIB (proximity graphs). Also, many of these ANN algorithms have some parameters you could tune, such as ef, efConstruction, and maxConnections for HNSW [1].

Additionally, you can enable vector compression for these indexing algorithms. Analogous to ANN algorithms, you will lose some precision with vector compression. However, depending on the choice of the vector compression algorithm and its tuning, you can optimize this as well.

However, in practice, these parameters are already tuned by research teams of vector databases and vector indexing libraries during benchmarking experiments and not by developers of RAG systems. However, if you want to experiment with these parameters to squeeze out the last bits of performance, I recommend this article as a starting point:

Inference Stage of a RAG Pipeline

The main components of the RAG pipeline are the retrieval and the generative components. This section mainly discusses strategies to improve the retrieval (Query transformations, retrieval parameters, advanced retrieval strategies, and re-ranking models) as this is the more impactful component of the two. But it also briefly touches on some strategies to improve the generation (LLM and prompt engineering).

Query Transformations

Since the search query to retrieve additional context in a RAG pipeline is also embedded into the vector space, its phrasing can also impact the search results. Thus, if your search query doesn’t result in satisfactory search results, you can experiment with various query transformation techniques [5, 8, 9], such as:

  • Rephrasing: Use an LLM to rephrase the query and try again.
  • Hypothetical Document Embeddings (HyDE): Use an LLM to generate a hypothetical response to the search query and use both for retrieval.
  • Sub-queries: Break down longer queries into multiple shorter queries.

Retrieval Parameters

The retrieval is an essential component of the RAG pipeline. The first consideration is whether semantic search will be sufficient for your use case or if you want to experiment with hybrid search.

In the latter case, you need to experiment with weighting the aggregation of sparse and dense retrieval methods in hybrid search [1, 4, 9]. Thus, tuning the parameter alpha, which controls the weighting between semantic (alpha = 1) and keyword-based search (alpha = 0), will become necessary.

Also, the number of search results to retrieve will play an essential role. The number of retrieved contexts will impact the length of the used context window (see Prompt Engineering). Also, if you are using a re-ranking model, you need to consider how many contexts to input to the model (see Re-ranking models).

Note, while the used similarity measure for semantic search is a parameter you can change, you should not experiment with it but instead set it according to the used embedding model (e.g., text-embedding-ada-002 supports cosine similarity or multi-qa-MiniLM-l6-cos-v1 supports cosine similarity, dot product, and Euclidean distance).

Advanced Retrieval Strategies

This section could technically be its own article. For this overview, we will keep this as concise as possible. For an in-depth explanation of the following techniques, I recommend this DeepLearning.AI course:

The underlying idea of this section is that the chunks for retrieval shouldn’t necessarily be the same chunks used for the generation. Ideally, you…



Source link

Tags: applicationsDecGuideLeonieMonigattiProductionReadyRAGstrategiesTuning
Previous Post

DVYE: More Risky Than It Seems, Uncertain Dividend Trends

Next Post

FRONTECH MICROPHONE 🎤

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
FRONTECH MICROPHONE 🎤

FRONTECH MICROPHONE 🎤

10 Ways Robotics and Automation will change YOUR Future | Technology 2025 and BEYOND

10 Ways Robotics and Automation will change YOUR Future | Technology 2025 and BEYOND

Machine Learning Interview Questions & Answers | Machine Learning Interview Preparation |Simplilearn

Machine Learning Interview Questions & Answers | Machine Learning Interview Preparation |Simplilearn

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

November 20, 2023
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In