How to Improve LLMs with RAG | by Shaw Talebi

To set up our knowledge base, we begin by installing and importing necessary Python libraries using HTML tags. The following code snippet demonstrates how to install the required libraries:

“`html
Imports
“`

We start by installing and importing necessary Python libraries:

“`html
!pip install llama-index !pip install llama-index-embeddings-huggingface !pip install peft !pip install auto-gptq !pip install optimum !pip install bitsandbytes

# if not running on Colab ensure transformers is installed too from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex from llama_index.core.retrievers import VectorIndexRetriever from llama_index.core.query_engine import RetrieverQueryEngine from llama_index.core.postprocessor import SimilarityPostprocessor
“`

Next, we can configure our knowledge base by defining our embedding model, chunk size, and chunk overlap. We use the ~33M parameter bge-small-en-v1.5 embedding model from BAAI, which is available on the Hugging Face hub. Other embedding model options are available on this text embedding leaderboard. The following snippet of code demonstrates how to set up the knowledge base:

“`html
Setting up Knowledge Base
“`

We start by configuring our knowledge base with the following settings:

“`html
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5") Settings.llm = None Settings.chunk_size = 256 Settings.chunk_overlap = 25 ```


Then, we load our source documents from a folder called "articles" containing PDF versions of 3 Medium articles on fat tails. For each file in the folder, the code reads the text from the PDF, splits it into chunks, and refines the chunks by removing irrelevant text. Finally, the refined chunks are stored in a vector database. The code snippet below demonstrates this process:
```html



Next, we load our source documents



```
```html



documents = SimpleDirectoryReader("articles").load_data()

for doc in documents:

    if "Member-only story" in doc.text:

        documents.remove(doc)

        continue

    if "The Data Entrepreneurs" in doc.text:

        documents.remove(doc)

    if " min read" in doc.text:

        documents.remove(doc)

```
Finally, we can create a retriever using LlamaIndex's VectorIndexRetriever and define a query engine that uses the retriever to return relevant chunks based on a user query. The following code snippet demonstrates how to set up the retriever and query engine:
```html



Setting up Retriever



```
```html



top_k = 3

retriever = VectorIndexRetriever(index=index, similarity_top_k=top_k)
query_engine = RetrieverQueryEngine(retriever=retriever, node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.5)])



```
Now, with our knowledge base and retrieval system set up, we can use it to return chunks relevant to a query. By passing a technical question to the query engine, we can retrieve relevant information. The code snippet below demonstrates how to use the query engine:
```html



Use Query Engine



```
```html



query = "What is fat-tailedness?"

response = query_engine.query(query)

```

The above code will return a response object containing relevant chunks of text. This information can be further processed and displayed for readability.





Source link 
						
						
														Tags: improveLLMsRAGShawTalebi