Friday, May 9, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

A beginner’s guide to building a Retrieval Augmented Generation (RAG) application from scratch | by Bill Chambers

March 12, 2024
in Data Science & ML
Reading Time: 7 mins read
0 0
A A
0
Share on FacebookShare on Twitter




Learn critical knowledge for building AI apps, in plain english

Retrieval Augmented Generation, or RAG, is all the rage these days because it introduces some serious capabilities to large language models like OpenAI’s GPT-4 — and that’s the ability to use and leverage their own data.

This post will teach you the fundamental intuition behind RAG while providing a simple tutorial to help you get started.

There’s so much noise in the AI space and in particular about RAG. Vendors are trying to overcomplicate it. They’re trying to inject their tools, their ecosystems, their vision.

It’s making RAG way more complicated than it needs to be. This tutorial is designed to help beginners learn how to build RAG applications from scratch. No fluff, no (ok, minimal) jargon, no libraries, just a simple step by step RAG application.

Jerry from LlamaIndex advocates for building things from scratch to really understand the pieces. Once you do, using a library like LlamaIndex makes more sense.

Build from scratch to learn, then build with libraries to scale.

Let’s get started!

You may or may not have heard of Retrieval Augmented Generation or RAG.

Here’s the definition from the blog post introducing the concept from Facebook:

Building a model that researches and contextualizes is more challenging, but it’s essential for future advancements. We recently made substantial progress in this realm with our Retrieval Augmented Generation (RAG) architecture, an end-to-end differentiable model that combines an information retrieval component (Facebook AI’s dense-passage retrieval system) with a seq2seq generator (our Bidirectional and Auto-Regressive Transformers [BART] model). RAG can be fine-tuned on knowledge-intensive downstream tasks to achieve state-of-the-art results compared with even the largest pretrained seq2seq language models. And unlike these pretrained models, RAG’s internal knowledge can be easily altered or even supplemented on the fly, enabling researchers and engineers to control what RAG knows and doesn’t know without wasting time or compute power retraining the entire model.

Wow, that’s a mouthful.

In simplifying the technique for beginners, we can state that the essence of RAG involves adding your own data (via a retrieval tool) to the prompt that you pass into a large language model. As a result, you get an output. That gives you several benefits:

  • You can include facts in the prompt to help the LLM avoid hallucinations
  • You can (manually) refer to sources of truth when responding to a user query, helping to double check any potential issues.
  • You can leverage data that the LLM might not have been trained on.

A collection of documents (formally called a corpus)

An input from the user

A similarity measure between the collection of documents and the user input

Yes, it’s that simple.

To start learning and understanding RAG based systems, you don’t need a vector store, you don’t even need an LLM (at least to learn and understand conceptually).

While it is often portrayed as complicated, it doesn’t have to be.

We’ll perform the following steps in sequence.

  1. Receive a user input
  2. Perform our similarity measure
  3. Post-process the user input and the fetched document(s).

The post-processing is done with an LLM.

The actual RAG paper is obviously the resource. The problem is that it assumes a LOT of context. It’s more complicated than we need it to be.

For instance, here’s the overview of the RAG system as proposed in the paper.

That’s dense.

It’s great for researchers but for the rest of us, it’s going to be a lot easier to learn step by step by building the system ourselves.

Let’s get back to building RAG from scratch, step by step. Here’s the simplified steps that we’ll be working through. While this isn’t technically “RAG” it’s a good simplified model to learn with and allow us to progress to more complicated variations.

Below you can see that we’ve got a simple corpus of ‘documents’ (please be generous 😊).

corpus_of_documents = [“Take a leisurely walk in the park and enjoy the fresh air.”,”Visit a local museum and discover something new.”,”Attend a live music concert and feel the rhythm.”,”Go for a hike and admire the natural scenery.”,”Have a picnic with friends and share some laughs.”,”Explore a new cuisine by dining at an ethnic restaurant.”,”Take a yoga class and stretch your body and mind.”,”Join a local sports league and enjoy some friendly competition.”,”Attend a workshop or lecture on a topic you’re interested in.”,”Visit an amusement park and ride the roller coasters.”]

Now we need a way of measuring the similarity between the user input we’re going to receive and the collection of documents that we organized. Arguably the simplest similarity measure is jaccard similarity. I’ve written about that in the past (see this post but the short answer is that the jaccard similarity is the intersection divided by the union of the “sets” of words.

This allows us to compare our user input with the source documents.

Side note: preprocessing

A challenge is that if we have a plain string like “Take a leisurely walk in the park and enjoy the fresh air.”, we’re going to have to pre-process that into a set, so that we can perform these comparisons. We’re going to do this in the simplest way possible, lower case and split by ” “.

def jaccard_similarity(query, document):
query = query.lower().split(” “)
document = document.lower().split(” “)
intersection = set(query).intersection(set(document))
union = set(query).union(set(document))
return len(intersection)/len(union)

Now we need to define a function that takes in the exact query and our corpus and selects the ‘best’ document to return to the user.

def return_response(query, corpus):
similarities = []
for doc in corpus:
similarity = jaccard_similarity(query, doc)
similarities.append(similarity)
return corpus_of_documents[similarities.index(max(similarities))]

Now we can run it, we’ll start with a simple prompt.

user_prompt = “What is a leisure activity that you like?”

And a simple user input…

user_input = “I like to hike”

Now we can return our response.

return_response(user_input, corpus_of_documents)

‘Go for a hike and admire the natural scenery.’

Congratulations, you’ve built a basic RAG application.

I got 99 problems and bad similarity is one

Now we’ve opted for a simple similarity measure for learning. But this is going to be problematic because it’s so simple. It has no notion of semantics. It’s just looks at what words are in both documents. That means that if we provide a negative example, we’re going to get the same “result” because that’s the closest document.

user_input = “I don’t like to hike”

return_response(user_input, corpus_of_documents)

‘Go for a hike and admire the natural scenery.’

This is a topic that’s going to come up a lot with “RAG”, but for now, rest assured that we’ll address this problem later.

At this point, we have not done any post-processing of the “document” to which we are responding. So far, we’ve implemented only the “retrieval” part of “Retrieval-Augmented Generation”. The next step is to augment generation by incorporating a large language model (LLM).

To do this, we’re going to use ollama to get up and running with an open source LLM on our local machine. We could just as easily use OpenAI’s gpt-4 or Anthropic’s Claude but for now, we’ll start with the open source llama2 from Meta AI.

This post is going to assume some basic knowledge of large language models, so let’s get right to querying this model.

import requests

import json

First we’re going to define the inputs. To work with this model, we’re going to take

user input,

fetch the most similar document (as measured by our similarity measure),

pass that into a prompt to the language model,

then return the result to the user

That introduces a new term, the prompt. In short, it’s the instructions that you provide to the LLM.

When you run this code, you’ll see the streaming result. Streaming is important for user experience.

user_input = “I like to hike”

relevant_document = return_response(user_input, corpus_of_documents)

full_response = []

prompt = “””You are a bot that makes recommendations for activities. You answer in very short sentences and do not include extra information.
This is the recommended activity: {relevant_document}
The user input is: {user_input}
Compile a recommendation to the user based on the recommended activity and the user input.”””

Having defined that, let’s now make the API call to ollama (and llama2). an important step is to make sure that ollama’s running already on your local machine by running ollama serve.

Note: this might be slow on your machine, it’s certainly slow on mine. Be patient, young grasshopper.

url = ‘http://localhost:11434/api/generate’

data = {“model”: “llama2″,”prompt”: prompt.format(user_input=user_input, relevant_document=relevant_document)}

headers = {‘Content-Type’: ‘application/json’}

response = requests.post(url, data=json.dumps(data), headers=headers, stream=True)

try:

count = 0

for line in response.iter_lines():
# filter out keep-alive new lines
# count += 1
# if count % 5== 0:
# print(decoded_line[‘response’]) # print every fifth token
if line:
decoded_line = json.loads(line.decode(‘utf-8’))
full_response.append(decoded_line[‘response’])

finally:

response.close()

print(”.join(full_response))

Great! Based on your interest in hiking, I recommend trying out the nearby trails for a challenging and rewarding experience with breathtaking views Great!…






Source link

Tags: ApplicationAugmentedbeginnerâsbillBuildingChambersGenerationGuideRAGretrievalscratch
Previous Post

Nibiru Chain Debuts Public Mainnet Along with Four Major Exchange Listings – Blockchain News, Opinion, TV and Jobs

Next Post

EU Vs Microsoft 365: Privacy Battle Escalates

Related Posts

AI Compared: Which Assistant Is the Best?
Data Science & ML

AI Compared: Which Assistant Is the Best?

June 10, 2024
5 Machine Learning Models Explained in 5 Minutes
Data Science & ML

5 Machine Learning Models Explained in 5 Minutes

June 7, 2024
Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’
Data Science & ML

Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’

June 7, 2024
How to Learn Data Analytics – Dataquest
Data Science & ML

How to Learn Data Analytics – Dataquest

June 6, 2024
Adobe Terms Of Service Update Privacy Concerns
Data Science & ML

Adobe Terms Of Service Update Privacy Concerns

June 6, 2024
Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart
Data Science & ML

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

June 6, 2024
Next Post
EU Vs Microsoft 365: Privacy Battle Escalates

EU Vs Microsoft 365: Privacy Battle Escalates

InVision is shutting down — where does that leave design collaboration services?

InVision is shutting down — where does that leave design collaboration services?

Enhancing Language Model Reasoning with Expert Iteration: Bridging the Gap Through Reinforcement Learning

Enhancing Language Model Reasoning with Expert Iteration: Bridging the Gap Through Reinforcement Learning

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

April 10, 2024
Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

November 20, 2023
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In