Sunday, May 11, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

June 6, 2024
in Data Science & ML
Reading Time: 7 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Exciting news! The Jina Embeddings v2 model, developed by Jina AI, is now accessible to customers on Amazon SageMaker JumpStart for easy deployment for model inference with just one click. This cutting-edge model supports an impressive 8,192-tokens context length. You can quickly deploy this model using SageMaker JumpStart, a hub for machine learning (ML) that offers foundation models, built-in algorithms, and pre-built ML solutions that can be deployed with minimal effort.

Text embedding involves converting text into numerical representations within a high-dimensional vector space. Text embeddings have a wide range of applications in enterprise artificial intelligence (AI), including multimodal search for ecommerce, content personalization, recommender systems, and data analytics.

Jina Embeddings v2 is a top-notch collection of text embedding models, developed by Jina AI in Berlin, known for their high performance on various public benchmarks.

In this article, we will guide you through discovering and deploying the jina-embeddings-v2 model as part of a Retrieval Augmented Generation (RAG)-based question answering system in SageMaker JumpStart. This tutorial can serve as a starting point for building chatbot-based solutions for customer service, internal support, and question answering systems utilizing internal and private documents.

Understanding RAG

RAG is the process of enhancing the output of a large language model (LLM) by referencing a credible knowledge base outside of its training data sources before generating a response.

LLMs are trained on vast amounts of data and utilize billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the capabilities of LLMs to specific domains or an organization’s internal knowledge base without the need for retraining the model. It provides a cost-effective way to enhance LLM output to remain relevant, accurate, and useful in various contexts.

Benefits of Jina Embeddings v2 for RAG Applications

A RAG system utilizes a vector database as a knowledge retriever. It extracts a query from a user’s prompt and sends it to a vector database to find semantic information reliably. The diagram below illustrates the architecture of a RAG application with Jina AI and Amazon SageMaker.

Jina Embeddings v2 is favored by experienced ML scientists for reasons like:

  • State-of-the-art performance on various text embedding benchmarks
  • Long input-context length of 8,192 tokens
  • Support for bilingual text input with specific language training
  • Cost-effectiveness in operating with small models and compact embedding vectors

Introduction to SageMaker JumpStart

SageMaker JumpStart offers ML practitioners a range of top-performing foundation models. Developers can deploy these models to dedicated SageMaker instances within a network-isolated environment and customize them using SageMaker for training and deployment.

You can now easily discover and deploy a Jina Embeddings v2 model with Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. This allows you to leverage model performance and MLOps controls with SageMaker features like Amazon SageMaker Pipelines and Amazon SageMaker Debugger. With SageMaker JumpStart, the model is deployed in a secure AWS environment under your VPC controls for enhanced data security.

Jina Embeddings models are available in AWS Marketplace for seamless integration into your deployments when working in SageMaker.

AWS Marketplace enables you to find third-party software, data, and services that run on AWS and manage them from a centralized location.

AWS Marketplace offers a wide array of software listings with flexible pricing options and deployment methods to simplify software licensing and procurement processes.

Overview of the Solution

A notebook is available to create and operate a RAG question answering system using Jina Embeddings and the Mixtral 8x7B LLM in SageMaker JumpStart.

This post provides an outline of the key steps required to bring a RAG application to life using generative AI models on SageMaker JumpStart. While some code and installation steps are omitted for readability, you can access the full Python notebook for execution.

Connecting to a Jina Embeddings v2 endpoint

To begin working with Jina Embeddings v2 models:

In SageMaker Studio, navigate to JumpStart.
Search for “jina” to find Jina AI’s provider page and available models.
Select Jina Embeddings v2 Base – en for English language embeddings.
Click Deploy.
Subscribe on the dialog to access the model on AWS Marketplace.
Return to SageMaker Studio and choose Deploy.
Select the instance and provide a name for the endpoint.
Click Deploy.

Once the endpoint is created, you can connect to it using the provided code snippet:

from jina_sagemaker import Client

client = Client(region_name=region)
endpoint_name = “my-jina-embeddings-endpoint”

client.connect_to_endpoint(endpoint_name=endpoint_name)

Preparing a dataset for indexing

This post uses a public dataset from Kaggle (CC0: Public Domain) containing audio transcripts from the Kurzgesagt – In a Nutshell YouTube channel.

Each row in the dataset includes the video title, URL, and transcript text.

Utilize the provided code to chunk the transcripts before indexing to focus on relevant content for answering user queries:

def chunk_text(text, max_words=1024):
“””
Divide text into chunks where each chunk contains the maximum number of full sentences under `max_words`.
“””
sentences = text.split(‘.’)
chunk = []
word_count = 0

for sentence in sentences:
sentence = sentence.strip(“.”)
if not sentence:
continue

words_in_sentence = len(sentence.split())
if word_count + words_in_sentence <= max_words:
chunk.append(sentence)
word_count += words_in_sentence
else:
if chunk:
yield ‘. ‘.join(chunk).strip() + ‘.’
chunk = [sentence]
word_count = words_in_sentence

if chunk:
yield ‘ ‘.join(chunk).strip() + ‘.’

The max_words parameter determines the maximum word count per chunk for indexing text. Various chunking strategies exist beyond this simple word limit.

However, for the purpose of simplicity, we use this technique in this post.

Index text embeddings for vector search

After you chunk the transcript text, you obtain embeddings for each chunk and link each chunk back to the original transcript and video title:

def generate_embeddings(text_df):
“””
Generate an embedding for each chunk created in the previous step.
“””
chunks = list(chunk_text(text_df[‘Text’]))
embeddings = []

for i, chunk in enumerate(chunks):
response = client.embed(texts=[chunk])
chunk_embedding = response[0][’embedding’]
embeddings.append(np.array(chunk_embedding))

text_df[‘chunks’] = chunks
text_df[’embeddings’] = embeddings
return text_df

print(“Embedding text chunks …”)
df = df.progress_apply(generate_embeddings, axis=1)

The dataframe df contains a column titled embeddings that can be put into any vector database of your choice. Embeddings can then be retrieved from the vector database using a function such as find_most_similar_transcript_segment(query, n), which will retrieve the n closest documents to the given input query by a user.

Prompt a generative LLM endpoint

For question answering based on an LLM, you can use the Mistral 7B-Instruct model on SageMaker JumpStart:

from sagemaker.jumpstart.model import JumpStartModel
from string import Template

# Define the LLM to be used and deploy through Jumpstart.
jumpstart_model = JumpStartModel(model_id=”huggingface-llm-mistral-7b-instruct”, role=role)
model_predictor = jumpstart_model.deploy()

# Define the prompt template to be passed to the LLM
prompt_template = Template(“””
<s>[INST] Answer the question below only using the given context.
The question from the user is based on transcripts of videos from a YouTube
channel.
The context is presented as a ranked list of information in the form of
(video-title, transcript-segment), that is relevant for answering the
user’s question.
The answer should only use the presented context. If the question cannot be
answered based on the context, say so.

Context:
1. Video-title: $title_1, transcript-segment: $segment_1
2. Video-title: $title_2, transcript-segment: $segment_2
3. Video-title: $title_3, transcript-segment: $segment_3

Question: $question

Answer: [/INST]
“””)

Query the LLM

Now, for a query sent by a user, you first find the semantically closest n chunks of transcripts from any video of Kurzgesagt (using vector distance between embeddings of chunks and the users’ query), and provide those chunks as context to the LLM for answering the users’ query:

# Define the query and insert it into the prompt template together with the context to be used to answer the question
question = “Can climate change be reversed by individuals’ actions?”
search_results = find_most_similar_transcript_segment(question)

prompt_for_llm = prompt_template.substitute(
question = question,
title_1 = df.iloc[search_results[0][1]][“Title”].strip(),
segment_1 = search_results[0][0],
title_2 = df.iloc[search_results[1][1]][“Title”].strip(),
segment_2 = search_results[1][0],
title_3 = df.iloc[search_results[2][1]][“Title”].strip(),
segment_3 = search_results[2][0]
)

# Generate the answer to the question passed in the propt
payload = {“inputs”: prompt_for_llm}
model_predictor.predict(payload)

Based on the preceding question, the LLM might respond with an answer such as the following:

Based on the provided context, it does not seem that individuals can solve climate change solely through their personal actions. While personal actions such as using renewable energy sources and reducing consumption can contribute to mitigating climate change, the context suggests that larger systemic changes are necessary to address the issue fully.

Clean up

After you’re done running the notebook, make sure to delete all the resources that you created in the process so your billing is stopped. Use the following code:

model_predictor.delete_model()
model_predictor.delete_endpoint()

Conclusion

By taking advantage of the features of Jina Embeddings v2 to develop RAG applications, together with the streamlined access to state-of-the-art models on SageMaker JumpStart, developers and businesses are now empowered to create sophisticated AI solutions with ease.

Jina Embeddings v2’s extended context length, support for bilingual documents, and small model size enables enterprises to quickly build natural language processing use cases based on their internal datasets without relying on external APIs.

Get started with SageMaker JumpStart today, and refer to the GitHub repository for the complete code to run this sample.

Connect with Jina AI

Jina AI remains committed to leadership in bringing affordable and accessible AI embeddings technology to the world.



Source link

Tags: AmazonapplicationsBuildEmbeddingsJinaJumpStartRAGSageMaker
Previous Post

Multilingual RAG, Algorithmic Thinking, Outlier Detection, and Other Problem-Solving Highlights | by TDS Editors | Jun, 2024

Next Post

FEBRABAN TECH 2024: confira agenda de palestras do SAS

Related Posts

AI Compared: Which Assistant Is the Best?
Data Science & ML

AI Compared: Which Assistant Is the Best?

June 10, 2024
5 Machine Learning Models Explained in 5 Minutes
Data Science & ML

5 Machine Learning Models Explained in 5 Minutes

June 7, 2024
Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’
Data Science & ML

Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’

June 7, 2024
How to Learn Data Analytics – Dataquest
Data Science & ML

How to Learn Data Analytics – Dataquest

June 6, 2024
Adobe Terms Of Service Update Privacy Concerns
Data Science & ML

Adobe Terms Of Service Update Privacy Concerns

June 6, 2024
Multilingual RAG, Algorithmic Thinking, Outlier Detection, and Other Problem-Solving Highlights | by TDS Editors | Jun, 2024
Data Science & ML

Multilingual RAG, Algorithmic Thinking, Outlier Detection, and Other Problem-Solving Highlights | by TDS Editors | Jun, 2024

June 6, 2024
Next Post
FEBRABAN TECH 2024: confira agenda de palestras do SAS

FEBRABAN TECH 2024: confira agenda de palestras do SAS

Boost Organic Posts with LinkedIn’s Thought-Leader Ads and Reduce Costs per Results

Boost Organic Posts with LinkedIn’s Thought-Leader Ads and Reduce Costs per Results

Adobe Terms Of Service Update Privacy Concerns

Adobe Terms Of Service Update Privacy Concerns

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

April 10, 2024
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In